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Dear Sir: 



This is a request for filing a □ continuation S divisional application under 37 CFR 1.53(b), of pending prior application 
serial no. 09/205,817 filed on December 4, 1998, of Charles R. Ill and Scott Bidlingmaier, entitled NOVEL VECTORS 
AND GENES EXHIBITING INCREASED EXPRESSION which claims priority to provisional application serial 
no. 60/071,596, filed January 16, 1998, and also claims priority to provisional application serial no. 60/067,614, filed on 
December 5, 1997; which claims priority to PCT application serial no. PCT/US98/25354, filed November 25, 1998. 

1 . I>D Enclosed is a copy of the latest inventor signed application, including the oath or declaration as originally filed. 
The copy of the enclosed papers is as follows: 

\E\ 41 pages of specification 

IZ1 5_ pages of claims 

L3 1_ page of abstract 

\E\ 41 sheets of drawing (Figures 1-25) 
Ex] 2_ pages of Preliminary Amendment 

\E\ 3_ page of copy of Transmittal Letter for Diskette of Sequence Listing 

\E\ 44 pages of Sequence Listing (pages 1-44) 

L3 3_ pages of copy of Notice to Comply with Sequence Requirements 

L*3 44 pages of substitute Sequence Listing (pages 1-44) 

S 12 pages of copy of executed declaration, petition and power of attorney. 

I hereby verify that the attached papers are a true copy of the prior complete application serial no. 09/205,817 
as originally filed on December 4, 1998 . 



2. GE] A verified statement to establish small entity status under 37 CFR 1 .9 and 1 .27, a copy of which is enclosed, 
was filed in the prior application and such status is still proper and desired (37 CFR 1.28(a)). 
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3. EE! The filing fee is calculated below: 





NUMBER OF 
CLAIMS FILED 






NUMBER 
EXTRA 


TOTAL 


* 17 


MINUS 


** 20 


0 


INDEP. 


* 2 


MINUS 


*** 3 


0 


□ MULTIPLE DEPENDENT CLAIMS 



OTHER THAN A 



SMALL ENTITY 




SMALL ENTITY 


RATE 


FEE 


OR 


RATE 


FEE 


x9 = 


S0.00 




x 18 = 


$.00 


x39 = 


$0.00 




x78 = 


$.00 


+130 = 


soo 




+ 260 = 


$.00 


BASIC FEE 


$345.00 




BASIC FEE 


$.00 


TOTAL 


$345 00 


OR 


TOTAL 


$0.00 



4. \E\ The Commissioner is hereby authorized to charge any additional fees which may be required in connection with 

this communication, or credit any overpayment, to Deposit Account No. 12-0080. A duplicate copy of this 
sheet is enclosed. 

5. [HI A check in the amount of $345.00 is enclosed for payment of the filing fee. 

6. ED Cancel in this application original claims 1-39 of the prior application before calculating the filing fee. (At least 

one original independent claim must be retained for filing purposes.) 

7. EH1 A preliminary amendment is enclosed. (Claims added by this amendment have been properly numbered 

consecutively beginning with the number next following the highest numbered original claims in the prior 
application.) 

8. [HI Amend the specification by inserting before the first line the sentences: "This application is a divisional 

application of serial no. 09/205,817, filed on December 4, 1998, which claims priority to provisional 
application serial no. 60/071,596, filed January 16, 1998, and also claims priority to provisional application 
serial no. 60/067,614, filed on December 5, 1997; which claims priority to PCT application serial no. 
PCT/US98/25354, filed November 25, 1998. The contents of all of the aforementioned application(s) are 
hereby incorporated by reference." 

9. □ Please abandon said prior application as of the filing date accorded this application. A duplicate copy of this 

transmittal is enclosed for filing in the prior application file. (May be used if signed by person authorized by 
§1.138 and before payment of base issue fee.) 

10. □ 

11. □ Priority of application serial no. filed on in is 

claimed under 35 U.S.C. §119. 

□ The certified copy has been filed in prior application 

serial no. filed on . 

□ The certified copy will follow. 

12. CHI The prior application is assigned of record to The Immune Response Corporation . 

13. □ A month extension of time has been submitted in the parent application Serial No. in order to 

establish copendency with the present application. 

14. El Also enclosed are: Transmittal Letter for Diskette of Sequence Listing; and Diskette Containing 

Sequence Listing. 
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15. ID The power of attorney in the prior application is to 



Lahive & Cockfield, LLP 



a. H The power appears in the original papers in the prior application. 

b. □ Since the power does not appear in the original papers, a copy of the power in 

the prior application is enclosed. 

c. DA new power has been executed and is attached. 

1 6. \E\ Address all future communications (May only be completed by applicant, or attorney or agent of record) 
to Jane E. Remillard at Customer Number: 000959 whose address is: 



17. \E\ Any requests for extensions of time necessary in a parent application for establishing copendency between this 

application and a parent application are hereby requested and the Commissioner is authorized to charge any fee 
associated with such an extension to Deposit Account No. 12-0080. 

1 8. S Pursuant to 37 CFR 1 .82 1 (e), the computer readable form of the sequence listing for this new application is to be 

identical with the computer readable form of application serial no. 09/205,817 . Please use the computer readable 
form of application serial no. 09/205/817 in lieu of filing a duplicate computer readable form in this application. 
Pursuant to 37 CFR 1.821(f), the content of the paper copy of the sequence listing for this new application and the 
computer readable form of application serial no. 09/205,817 are the same. 



Lahive & Cockfield, LLP 
28 State Street 
Boston, Massachusetts 02109 



LAHIVE & COCKFIELD, LLP 
28 State Street 

Boston, Massachusetts 02109 
Tel. (617) 227-7400 



Date: April 20, 2000 




U assignee of complete interest 
@ attorney or agent of record 



Attorney's 

Docket No.: TP- 180 



Applicant or Patentee: Charles R. *,* wt al. 
Serial or Patent No.: 09/205,817 
Filed or Issued: December 4, 1998 



Title: 



NOVEL VECTORS AND GENES EXHIBITING INCREASED EXPRESSION 



VERIFIED STATEMENT (DECLARATION) CLAIMING SMALL ENTITY STATUS 
(37 CFR 1.9(f) and 1.27(c)) - SMALL BUSINESS CONCERN 

I hereby declare that I am 

D the owner of the small business concern identified below: 

an official of the small business concern empowered to act on behalf of the concern identified below: 

NAME OF SMALL BUSINESS CONCERN The Immune Response Corporation 

ADDRESS OF SMALL BUSINESS CONCER N 5935 Darwin Court 

Carlsbad, CA 92008 




I hereby declare that the above identified small business concern qualifies as a small business concern as defined in 13 
CFR 121.12, and reproduced in 37 CFR 1.9(d), for purposes of paying reduced fees to the United States Patent and Trademark 
Office, in that the number of employees of the concern, including those of its affiliates, does not exceed 500 persons. For 
purposes of this statement, (1) the number of employees of the business concern is the average over the previous fiscal year of 
the concern of the persons employed on a full-time, part-time or temporary basis during each of the pay periods of the fiscal 
year, and (2) concerns are affiliates of each other when either, directly or indirectly, one concern controls or has the power to 
control the other, or a third party or parties controls or has the power to control both. 

I hereby declare that rights under contract or law have been conveyed to and remain with the small business concern 
identified above with regard to the invention described in: 

D the specification filed herewith with title as listed above. 

03 the application identified above. 
D the patent identified above. 

If the rights held by the above identified small business concern are not exclusive, each individual, concern or organization 
having rights in the invention is listed below* and no rights to the invention are held by any person, other than the inventor, who 
would not qualify as an independent inventor under 37 CFR 1.9(c) if that person made the invention, or by any concern which 
would not qualify as a small business concern under 37 CFR 1.9(d), or a nonprofit organization under 37 CFR 1.9(e). 

♦NOTE: Separate verified statements are required from each named person, concern or organization having rights to the invention averring to their status as 
small entities. (37 CFR 1.27) 

NAME 

ADDRESS 



□ INDIVIDUAL □ SMALL BUSINESS CONCERN □ NONPROFIT ORGANIZATION 



NAME 
ADDRESS 



□ INDIVIDUAL □ SMALL BUSINESS CONCERN O NONPROFIT ORGANIZATION 

I acknowledge the duty to file, in this application or patent, notification of any change in status resulting in loss of 
entitlement to small entity status prior to paying, or at the time of paying, the earliest of the issue fee or any maintenance fee due 
after the date on which status as a small entity is no longer appropriate. (37 CFR 1.28(b)) 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made on 
information and belief are believed to be true; and further that these statement were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, under section 1001 of Title 18 of the United 
States Code, and that such willful false statements may jeopardize the validity of the application, any patent issuing thereon, or 
any patent to which this verified statement is directed. 

NAME OF PERSON SIGNING Dennis J. Carlo 

TITLE OF PERSON OTHER THAN OWNER President and CEO 



ADDRESS OF PERSON SIGNING The Immune Response Corporation, 5935 Darwin Court ; Carisbad, CA 92008 
SIGNATURE DAT E VJ/^ 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re the application of: Charles R. Ill and 

Scott Bidlingmaier 

Serial No.: Not Yet Assigned 

Filed: Herewith 

For: Novel Vectors and Genes Exhibiting 
Increased Expression 

Attorney Docket No.: TTI-180DV 



Assistant Commissioner for Patents 
Washington, D.C. 20231 

CERTIFICATION UNDER 37 CFR 1.10 

Date of Deposit: April 20, 2000 Mailing Label Number: EL 263 575 845 US 

I hereby certify that this 37 CFR 1.53(b) request and the documents referred to as attached therein are 
being deposited with the United States Postal Service on the date indicated above in an envelope as 
"Express Mail Post Office to Addressee" service under 37 CFR 1.10 and addressed to the Assistant 
Commissioner for Patents, Box Patent Application, Washington, D.C. 20231. 



Nelson F. Barros 




Name of Person Mailing Paper Signature of Person Mailing Paper 

PRELIMINARY AMENDMENT 

Dear Sir: 

Prior to examination of the above-identified application, please amend the specification 
as follows: 

In the specification : 

Please replace the original Sequence Listing (pages 1-44) with the enclosed substitute 
Sequence Listing (substitute pages 1-44). No renumbering of the Sequence Listing is required as 
the page numbers have not changed. 



Group Art Unit: Not Yet Assigned 
Examiner: Not Yet Assigned 



Divisional application of 
U.S. Serial No. 09/205,817 



-2- 



Group Art Unit: 



REMARKS 



Applicants submit herewith substitute pages 1-44 which contain a corrected Sequence 
Listing for the above-referenced application, in accordance with 37 C.F.R. 1.821. This 
corrected Sequence Listing was also filed in the parent application, U.S. Serial No. 09/205,817, 
in a response to a Notice to Comply with Requirements for Patent Applications Containing 
Nucleotide Sequence and/or Amino Acid Sequence Disclosures dated June 7, 1999 (copy 
enclosed). 

In addition, Applicants submit herewith a computer-readable form (diskette) of the 
corrected Sequence Listing which is identical in substance to the corrected paper Sequence 
Listing (pages 1-44) submitted herewith. 

No new matter has been added to the application. Accordingly, as the above amendments 
do not affect the issue of patentability, it is respectfully requested that they be entered. 



LAHIVE & COCKFIELD, LLP 
28 State Street 
Boston, MA 02109 
Tel. (617) 227-7400 

Dated: April 20, 2000 



Respectfully submitted, 
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NOVEL VECTORS AND GENES EXHIBITING INCREASED EXPRESSION 

Related Applications 

This application claims priority to U.S. Serial No. 60/071,596, filed on January 16, 
5 1998, and to U.S. Serial No.60/067,614, filed on December 5, 1997, the entire contents both 
of which are incorporated herein by reference. 

Background of the Invention 

Recombinant DNA technology is currently the most valuable tool known for 

10 producing highly pure therapeutic proteins both in vitro and in vivo to treat clinical diseases. 
Accordingly, a vast number of genes encoding therapeutic proteins have been identified and 
cloned to date, providing valuable sources of protein. The value of these genes is, however, 
often limited by low expression levels. 

This problem has traditionally been addressed using regulatory elements, such as 

15 optimal promoters and enhancers, which increase transcription/expression levels of genes. 
Additional techniques, particularly those which do not rely on foreign sequences (e.g., viral 
or other foreign regulatory elements) for increasing transcription efficiency of cloned genes, 
resulting in higher expression, would be of great value. 

Accordingly, the present invention provides novel methods for increasing gene 

20 expression, and novel genes which exhibit such increased expression. 

Gene expression begins with the process of transcription. Factors present in the cell 
nucleus bind to and transcribe DNA into RNA. This RNA (known as pre-mRNA) is then 
processed via splicing to remove non-coding regions, referred to as introns, prior to being 
exported out of the cell nucleus into the cytoplasm (where they are translated into protein). 

25 Thus, once spliced, pre-mRNA becomes mRNA which is free of introns and contains only 
coding sequences (i.e., exons) within its translated region. 

Splicing of vertebrate pre-mRNAs occurs via a two step process involving splice site 
selection and subsequent excision of introns. Splice site selection is governed by definition 
of exons (Berget et al. (1995) J. Biol Chem. 270(6):241 1-2414), and begins with recognition 

30 by splicing factors, such as small nuclear ribonucleoproteins (snRNPs), of consensus 

sequences located at the 3 f end of an intron (Green et al. (1986) Annu. Rev. Genet 20:671- 
708). These sequences include a 3 f splice acceptor site, and associated branch and pyrimidine 
sequences located closely upstream of 3* splice acceptor site (Langford et al. (1983) Cell 
33:519-527). Once bound to the 3' splice acceptor site, splicing factors search downstream 

35 through the neighboring exon for a 5' splice donor site. For internal introns, if a 5' splice 
donor site is found within about 50 to 300 nucleotides downstream of the 3' splice acceptor 
site, then the 5 r splice donor site will generally be selected to define the exon (Robberson et 
al. (1990) Mol Cell Biol H)(l):84-94), beginning the process of spliceosome assembly. 
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Accordingly, splicing factors which bind to 3 r splice acceptor and 5* splice donor sites 
communicate across exons to define these exons as the original units of spliceosome 
assembly, preceding excision of introns. Typically, stable exon complexes will only form 
and internal introns thereafter be defined if the exon is flanked by both a 3' splice acceptor 
5 site and 5' splice donor site, positioned in the correct orientation and within 50 to 300 
nucleotides of one another. 

It has also been shown that the searching mechanism defining exons is not a strict 5' 
to 3' (i.e., downstream) scan, but instead operates to find the "best fit" to consensus sequence 
(Robberson et al. ; supra, at page 92). For example, if a near-consensus 5* splice donor site is 
10 located between about 50 to 300 nucleotides downstream of a 3* splice acceptor site, it may 
still be selected to define an exon, even if it is not consensus. This may explain the variety of 
different splicing patterns (referred to as "alternative splicing") which is observed for many 
genes. 

15 Summary of the Invention 

The present invention provides novel DNAs which exhibit increased expression of a 
protein of interest. The novel DNAs also can be characterized by increased levels of 
cytoplasmic mRNA accumulation following transcription within a cell, and by novel splicing 
patterns. The present invention also provides expression vectors which provide high tissue- 

20 specific expression of DNAs, and compositions for delivering such vectors to cells. The 
invention further provides methods of increasing gene expression and/or modifying the 
transcription pattern of a gene. The invention still further provides methods of producing a 
protein by recombinant expression of a novel DNA of the invention. 

In one embodiment, a novel DNA of the invention comprises an isolated DNA (e.g., 

25 gene clone or cDNA) containing one or more consensus or near consensus splice sites (3' 
splice acceptor or 5* splice donor) which have been corrected. Such consensus or near 
consensus splice sites can be corrected by, for example, mutation (e.g., substitution) of at 
least one consensus nucleotide with a different, preferably non-consensus, nucleotide. These 
consensus nucleotides can be located within a consensus or near consensus splice site, or 

30 within an associated branch sequence (e.g., located upstream of a 3 r splice acceptor site). 

Preferred consensus nucleotides for correction include invariant (i.e., conserved) nucleotides, 
including one or both of the invariant bases (AG) present in a 3 r splice acceptor site; one or 
both of the invariant bases (GT) present in a 5 1 splice donor site; or the invariant A present in 
the branch sequence of a 3' splice acceptor site. 

35 If the consensus or near consensus splice site is located within the coding region of a 

gene, then the correction is preferably achieved by conservative mutation. In a particularly 
preferred embodiment, all possible conservative mutations are made within a given consensus 
or near consensus splice site, so that the consensus or near consensus splice site is as far from 
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consensus as possible (i.e., has the least homology to consensus as is possible) without 
changing the coding sequence of the consensus or near consensus splice site. 

In another embodiment, a novel DNA of the invention comprises at least one non- 
naturally occurring intron, either within a coding sequence or within a 5' and/or 3 f non-coding 
5 sequence of the DNA. Novel DNAs comprising one or more non-naturally occurring introns 
may further comprise one or more consensus or near consensus splice sites which have been 
corrected as previously summarized. 

In a particular embodiment of the invention, the present invention provides a novel 
gene encoding a human Factor VIII protein. This novel gene comprises one or more non- 

1 0 naturally occurring introns which serve to increase transcription of the gene, or to alter 
splicing of the gene. The gene may alternatively or additionally comprise one or more 
consensus splice sites or near consensus splice sites which have been corrected, also to 
increase transcription of the gene, or to alter splicing of the gene. In one embodiment, the 
Factor VIII gene comprises the coding region of the full-length human Factor VIII gene, 

1 5 except that the coding region has been modified to contain an intron spanning, overlapping or 
within the region of the gene encoding the P-domain. This novel gene is therefore expressed 
as a P-domain deleted human Factor VIII protein, since all or a portion of the P-domain 
coding sequence (defined by an intron) is spliced out during transcription. 

A particular novel human Factor VIII gene of the invention comprises the nucleotide 

20 sequence shown in SEQ ID NO: 1 . Another particular novel human Factor VIII gene of the 
invention comprises the coding region of the nucleotide sequence shown in SEQ ID NO:3 
(nucleotides 1006-8237). Particular novel expression vectors of the invention comprise the 
complete nucleotide sequences shown in SEQ ID NOS: 2, 3 and 4. These vectors include 
novel 5' untranslated regulatory regions designed to provide high liver-specific expression of 

25 human Factor VIII protein. 

In still other embodiments, the invention provides a method of increasing expression 
of a DNA sequence (e.g., a gene, such as a human Factor VIII gene), and a method of 
increasing the amount of mRNA which accumulates in the cytoplasm following transcription 
of a DNA sequence. In addition, the invention provides a method of altering the transcription 

30 pattern (e.g., splicing) of a DNA sequence. The methods of the present invention each 

involve correcting one or more consensus or near consensus splice sites within the nucleotide 
sequence of a DNA, and/or adding one or more non-naturally occurring introns into the 
nucleotide sequence of a DNA. 

In a particular embodiment, the invention provides a method of simultaneously 

35 increasing expression of a gene encoding human Factor VIII protein, while also altering the 
gene's splicing pattern. The method involves inserting into the coding region of the gene an 
intron which spans, overlaps or is contained within the portion of the gene encoding the P- 
domain. The method may additionally or alternatively comprise correcting within either the 
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coding sequence or the 5' or 3' untranslated regions of the novel Factor VIII gene, one or 
more consensus or near consensus splice sites. 

In yet another embodiment, the invention provides a method of producing a human 
Factor VIII protein, such as a p-domain deleted Factor VIII protein, by introducing an 
5 expression vector containing a novel human Factor VIII gene of the invention into a host cell 
capable of expressing the vector, under conditions appropriate for expression, and allowing 
for expression of the vector to occur. 



10 Brief Description of the Figures 

Figure 1 shows the nucleotide sequence of an RNA intron. The GU of the 5' splice 
donor site, the AG of the 3' splice acceptor site, and the A of the Branch are invariant bases 
(100% conserved and essential for recognition as splice sites). U is T in a DNA intron. The 
Branch sequence is located upstream from the 3* splice acceptor site at a distance sufficient to 
15 allow for lariat formation during spliceosome assembly (typically within 30-60 nucleotides). 
N is any nucleotide. Splicing will occur 5 ! of the GT base pair within the 5 f splice donor site, 
and 3' of the AG base pair. 

Figure 2 shows the conservative correction of a near consensus 3 ' splice acceptor site. 
20 The correction is made by silently mutating the A of the invariant (conserved) AG base pair 
to C, G, or T which does not affect the coding sequence of the intron because Ser is encoded 
by three alternate codons. 

Figure 3 is a map of the coding region of a P-domain deleted human Factor VIII 
25 cDNA, showing the positions of the 99 silent point mutations which were made within the 

coding region (contained in plasmid pDJC) to conservatively correct all near consensus splice 

sites. Numbering of nucleotides begins with the ATG start coding of the coding sequence. 

Arrows above the map show positions mutated within near consensus 5' splice donor sites. 

Arrows below the map show positions mutated within near consensus 3' splice acceptor sites. 
30 Each "B" shown on the map shows a position mutated within a consensus branch sequence. 

Figure 4A-4C shows the silent nucleotide substitution made at each of the 99 
positions maked by arrows in Figure 3, as well as the codon containing the substitution and 
the amino acid encoded. 

35 

Figure 5A-50 is a comparison of the coding sequence of (a) plasmid pDJC (top) 
containing the coding region of the human p-domain deleted Factor VIII cDNA modified by 
making 99 conservative point mutations to correct all near consensus splice sites within the 
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coding region, and (b) plasmid p25D (bottom) containing the same coding sequence prior to 
making the 99 point mutations. Point mutations (substitions) are indicated by a 6 V between 
the two aligned sequences and correspond to the positions within the pD JC coding sequence 
shown in Figure 3. Plasmid p25D contains the same coding region as does plasmid pCY-2 
5 shown in Figure 7 and referred to throughout the text. 

Figure 6 shows a map of plasmid pDJC including restriction sites used for cloning, 
regulatory elements within the 5' untranslated region, and the corrected human p-domain 
deleted Factor VIII cDNA coding sequence. 

10 

Figure 7 shows a map of plasmid pC Y-2 including restriction sites used for cloning, 
regulatory elements within the 5' untranslated region, and the uncorrected (i.e., naturally- 
occurring) human p-domain deleted Factor VIII cDNA coding sequence. pC Y-2 and pDJC 
are identical except for their coding sequences. 

15 

Figure 8 is a map of the human p-domain deleted Factor VIII cDNA coding region 
showing the five sections of the cDNA (delineated by restriction sites) which can be 
synthesized (using overlapping 60-mer oligonucleotides) to contain corrected near consensus 
splice sites, and then and assembled together to produce a new, corrected coding region. 

20 

Figure 9 is a schematic illustration of the cloning procedure used to insert an 
engineered intron into the coding region of the human Factor VIII cDNA, spanning a 
majority of the region of the cDNA encoding the p-domain. PCR fragments were generated 
containing nucleotide sequences necessary to create consensus 5' splice donor and 3' splice 

25 acceptor sites when cloned into selected positions flanking the P-domain coding sequence. 
The fragments were then cloned into plasmid pBluescript and sequenced. Once sequences 
had been confirmed, the fragments creating the 5' splice donor (SD) site were cloned into 
plasmid pCY-601 and pCY-6 (containing the full-length human Factor VIII cDNA coding 
region) immediatedly upstream of the p-domain coding sequence, and fragments creating the 

30 V splice acceptor (SA) site were cloned into pCY-601 and pCY-6 immediately downstream 
of the p-domain coding sequence. The resulting plasmids are referred to as pLZ-601 and 
pLZ-6, respectively. 

Figure 10 is a map of the full-length human Factor VIII gene, showing the Al, A2, B, 
35 A3, CI and C2 domains. Following expression of the gene, the p domain is naturally cleaved 
out of the protein. The map shows the 5' and 3' splice sites inserted within the B region of 
the gene (in plasmid pLZ-6) so that, during pre-mRNA processing of the gene, the majority 
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of the B region will be spliced out. Segments A2 and A3 of the gene will then be juxtaposed, 
coding for amino acids SFSQNPPV at the juncture. 

Figure 1 1 shows the nucleotide sequences of the exon/intron boundaries (SEQ ID 
5 NO:5) flanking the p-domain coding region in plasmid pLZ-6 (containing the full-length 
human Factor VIII cDNA). The 5' splice donor site was added so that splicing would occur 
5' of the "g" shown at position 2290. The 3' splice acceptor site was added so that splicing 
would occur 3' of the "g" shown at position 5147. Following splicing of the intron created 
by these splice sites, amino acids Gln-744 and Asn-1639 of the full-length human Factor VIII 
10 protein are brought together, resulting in a deletion of amino acids 745 to 1638 (numbering is 
in reference to Ala-1 of the mature human Factor VIII protein following cleavage of the 19 
amino acid signal peptide). Capital letters represent nucleotide bases which remain within 
exons of the mRNA. Small case letters represent nucleotide bases which are spliced out of 
the mRNA as part of the intron. 

15 

Figure 12 is a map of the coding region of the full-length human Factor VIII gene 
showing (a) ATG (start) and TGA (stop) codons, (b) restriction sites within the coding 
region, (c) 5' splice donor (SD) and 3' splice acceptor (SA) sites of a rabbit (3-globin intron 
positioned upstream of the coding region within the 5 r untranslated region, (d) 5' splice donor 
20 and 3' splice acceptor sites added within the coding region defining an internal intron 
spanning the P-domain. 

Figure 13 is a schematic illustration comparing the process of transcription, 
expression and post-translational modification for human Factor VIII produced from (a) a 
25 full-length human Factor VIII gene, (b) a p-domain deleted human Factor VIII gene, and (c) a 
full-length human Factor VIII gene containing an intron spanning the p-domain coding 
region. 

Figure 14 is a graphic comparison of human Factor VIII expression for (a) pCY-6 
30 (containing the coding region of the full-length human Factor VIII cDNA, as well as a 5' 
untranslated region derived from the second IVS of rabbit beta globin gene), (b) pCY-601 
(containing the coding region of the full-length human Factor VIII cDNA, without the rabbit 
beta globin IVS), (c) pLZ-6 (containing the coding region of a full-length human Factor VIII 
cDNA with an intron spanning the p-domain, as well as the rabbit beta globin IVS), and (d) 
35 pLZ-60 1 (containing the coding region of a full-length human Factor VIII cDNA with an 
intron spanning the majority of the p-domain, without the rabbit beta globin IVS). 
Expression is given in nanograms. Transfection efficiencies were normalized to expression 
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of human growth hormone (hGH). Each bar represents a summary of four separate 
transfection experiments. 

Figure 15 shows areas within the human Factor VIII transcription unit for sequence 
5 optimization. 

Figure 16 shows the optimized intron-split leader sequence within vectors pCY-2, 
pCY-6, PLZ-6 and pCY2-SRE5, as well as the secondary structure of the leader sequence 
(SEQ ID NO: 1 1) predicted by the computer program RNAdraw™. 

10 

Figure 17 is a schematic illustration showing two different RNA export pathways. 
The majority of mRNA's in higher eukaryotes contain intronic sequences which are removed 
within the nucleus (splicing pathway), follwed by export of the mRNA into the cytoplasm. 
Mammalian intronless genes, hepadnaviruses (e.g., HBV), and many retroviruses access a 
1 5 nonsplicing pathway which is facilitated by cellular RNA export proteins (facilitated 
pathway). 

Figure 18 is a graph showing the effect of a 5' intron and 3 f post-transcriptional 
regulatory element (PRE) on human Factor VIII expression levels in HuH-7 cells. Plasmid 

20 pCY-2 contains a 5' intron but no PRE. Plasmid pCY-201 is identical to pCY-2, except that 
it lacks the 5' intron. Plasmid pCY-401 and pCY-402 are identical to pCY-201, except that 
they contain one and two copies of the PRE, respectively. The levels of secreted active 
Factor VIII was measured from supernatants collected 48 hours (first bar of each group) or 72 
hours (second bar of each group) after transfection by Coatest VIII: c/4 kit from Kabi Inc. 

25 The transfection efficiency of each plasmid was normalized by analysis of human growth 
hormone secreted levels. 

Figure 19 is a graph comparing human Factor VIII expression in vivo in mice for 
plasmids containing various regulatory elements upstream of either the p-domain deleted or 

30 full-length human Factor VIII gene. Plasmid pCY-2 has a 5' untranslated region containing 
the liver-specific thyroxin binding globulin (TBG) promoter, two copies of the liver-specific 
alpha- 1 microglobulin/bikunin (ABP) enhancer; and a modified rabbit p-globin IVS, all 
upstream of the human p-domain deleted Factor VIII gene. Plasmid pCY2-SE5 is identical 
to pC Y-2 except that the TBG promoter was replaced by the endothelium-specific human 

35 endothelin-1 (ET-1) gene promoter, and the ABP enhancers (both copies) were replaced by 
one copy of the human c-fos gene (SRE) enhancer. Plasmid pCY-6 is identical to pCY-2, 
except that the human p-domain deleted Factor VIII gene was replaced by the full-length 
human Factor VIII gene. Plasmid pLZ-6 is identical to pCY-6, except that the full-length 
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human Factor VIII gene contained an intron spanning the p-domain. Plasmid pLZ-6A is 
identical to pLZ-6, except that it contains one corrected near consensus 3' splice acceptor site 
(A to C at base 3084 of pCY-6 (SEQ ID NO:3). Each bar represents an average of five mice, 

5 Figure 20 shows the nucleotide sequence of the human alpha- 1 microglobulin/bikunin 

(ABP) enhancer. Clustered liver-specific elements are underlined and labeled HNF-1, HNF-3 
and HNF-4. 

Figure 21 shows the nucleotide sequence of the human thyroxin binding globulin 
1 0 (TBG) promoter, also containing clustered liver-specific enhancer elements. 

Figure 22 shows the nucleotide sequence and secondary structure of an optimized 
leader sequence. 

15 Figure 23 is a comparison of the nucleotide sequences of the rabbit p-globin IVS 

before (top line) and after (bottom line) optimization to contain consensus 5' splice donor, 3* 
splice acceptor, branch, and translation initiation sites. Five nucleotides were also changed 
from purines to pyrimidines to optimize the pyrimidine track. 

20 Figure 24 contains a list of various endothelium-specific promoters and enhancers, 

and characteristics associated with these promoters and enhancers. 

Figure 25 is a graph comparing expression of plasmid pCY-2 and p25D in vivo in 
mice. Both plasmids contain the same coding sequence (for human p-domain deleted Factor 
25 VIII). Plasmid pCY-2 has an optimized 5' UTR containing two copies of the ABP enhancer, 
one copy of the TBG promoter and a leader sequence split by an optimized 5 1 rabbit p-globin 
intron. Plasmid p25D has a 5' UTR containing one copy of the CMV enhancer, one copy of 
the CMV promoter, and a leader sequence containing a short (130 bp) chimeric human IgE 
intron. Each bar represents an average of 5 mice. 

30 

Detailed Description of the Invention 

DEFINITIONS 

The present invention is described herein using the following terms which shall be 
understood to have the following meanings: 
35 An "isolated DNA" means a DNA molecule removed from its natural sequence 

context (i.e., from its natural genome). The isolated DNA can be any DNA which is capable 
of being transcribed in a cell, including for example, a cloned gene (genomic or cDNA clone) 
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encoding a protein of interest, operably linked to a promoter. Alternatively, the isolated 
DNA can encode an antisense RNA. 

A "5' consensus splice site" means a nucleotide sequence comprising the following 
bases: MAGGTRAGT, wherein M is (C or A), wherein R is (A or G) and wherein GT is 
5 essential for recognition as a 5' splice site (hereafter referred to as the "essential GT pair" or 
the "invariant GT pair"). 

A "3' consensus splice site" means a nucleotide sequence comprising the following 
bases (Y>8)NYAGG, wherein Y>8 is a pyrimidine track containing at least eight (most 
commonly twelve to fifteen or more) tandem pyrimidines (i.e., C or T (U if RNA)), wherein 

10 N comprises any nucleotide, wherein Y is a is a pyrimidine, and wherein the AG is essential 
for recognition as a 3' splice site (hereafter referred to as the "essential AG pair" or the 
"invariant AG pair"). A "3' consensus splice site" is also preceded upstream (at a sufficient 
distance to allow for lariat formation, typically at least about 40 bases) by a "branch 
sequence" comprising the following seven nucleotide bases: YNYTRAY, wherein Y is a 

1 5 pyrimidine (C or T), N is any nucleotide, R is a purine (A or G), and A is essential for 

recognition as a branch sequence (hereafter referred to as "the essential A" or the "invariant 
A"). When all seven branch nucleotides are located consecutively in a row, the branch 
sequence is a "consensus branch sequence." 

A "near consensus splice site" means a nucleotide sequence which: 

20 (a) comprises the essential 3' AT pair, and is at least about 50% homologous, more 
preferably at least about 60-70% homologous, and most preferably greater than 70% 
homologous to a 3' consensus splice site, when aligned with the consensus splice site for 
purposes of comparison; or 

(b) comprises the essential 5' GT pair, and is at least about 50% homologous, more 
25 preferably at least about 60-70% homologous, and most preferably greater than 70% 

homologous to a 5' consensus splice site, when aligned with the consensus splice site for 

purposes of comparison. 

Homology refers to sequence similarity between two nucleic acids. Homology can be 

determined by comparing a position in each sequence which may be aligned for purposes of 
30 comparison. When a position in the compared sequence is occupied by the same nucleotide 

base, then the molecules are homologous at that position. A degree of homology between 

sequences is a function of the number of matching or homologous positions shared by the 

sequences. 

As will be described in more detail below, additional criteria for selecting "near 
35 consensus splice sites" can be used, adding to the definition provided above. For example, if 
a near consensus splice site shares homology with a 5' consensus splice site in only 5 out of 9 
bases (i.e., about 55% homology), then these bases can be required to be located 
consecutively in a row. It can additionally or alternatively be required that a 3' near 
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consensus splice site be preceded by a consensus branch sequence (i.e., no mismatches 
allowed), or followed downstream by a consensus or near consensus 5' splice donor site, to 
make the selection more stringent. 

The term "corrected" as used herein refers to a near consensus splice site mutated by 
5 substitution of at least one nucleotide shared with a consensus splice site, hereafter referred to 
as a "consensus nucleotide". The consensus nucleotide within the near consensus splice site 
is substituted with a different, preferably non-consensus nucleotide. This makes the near 
consensus splice site "farther from consensus." 

If the near consensus splice site is within a coding region of a gene, then the 

10 correction is preferably a conservative mutation. A "conservative mutation" means a base 
mutation which does not affect the amino acid sequence coded for, also known as a "silent 
mutation." Accordingly, in a preferred embodiment of the invention, correction of a near 
consensus splice site located within the coding region of a gene includes making all possible 
conservative mutations to consensus nucleotides within the site, so that the near consensus 

1 5 splice site is as far from consensus as possible without changing the amino acid sequence it 
encodes. 

A "Factor VIII gene" as used herein means a gene (e.g., a cloned genomic gene or a 
cDNA) encoding a functional human Factor VIII protein from any species (e.g., human or 
mouse). A Factor VIII gene which is "full-length" comprises the complete coding sequence 

20 of the human Factor VIII gene found in nature, including the region encoding the p-domain. 
A Factor VIII gene which "encodes a p-domain deleted Factor VIII protein" or "a p-domain 
deleted Factor VIII gene" lacks all or a portion of the region of the full-length gene encoding 
the p-domain and, therefore, is transcribed and expressed as a "truncated" or "p-domain 
deleted" Factor VIII protein. A gene which "is expressed as a p-domain deleted Factor VIII 

25 protein" includes not only a gene which encodes a p-domain deleted Factor VIII protein, but 
also a novel Factor VIII gene provided by the present invention which comprises the coding 
region of a full-length Factor VIII gene, except that it additionally contains an intron 
spanning the portion of the gene encoding the p-domain. The term "spans" means that the 
intron overlaps, encompasses, or is encompassed by the portion of the gene encoding the p 

30 domain. The portion of the gene spanned by the intron is then spliced out of the gene during 
transcription, so that the resulting mRNA is expressed as a truncated or p-domain deleted 
Factor VIII protein. 

A "truncated" or "P-domain deleted" Factor VIII protein includes any active Factor 
VIII protein (human or otherwise) which contains a deletion of all or a portion of the p- 
35 domain.. 

A "non-naturally occurring intron" means an intron (defined by a 5' splice donor site 
and a 3' splice acceptor site) which has been engineered into a gene, and which is not present 
in the natural DNA or pre-mRNA nucleotide sequences of the gene. 
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An "expression vector" means any DNA vector (e.g., a plasmid vector) containing the 
necessary genetic elements for expression of a novel gene of the present invention. These 
elements, including a suitable promoter and preferably also a suitable enhancer, are "operably 
linked" to the gene, meaning that they are located at a position within the vector which 
5 enables them to have a functional effect on transcription of the gene. 

IDENTIFICATION OF CONSENSUS AND NEAR CONSENSUS SPLICE SITES 

A consensus or near consensus splice site can be identified within a DNA, or its 
corresponding RNA transcript, by evaluating the nucleotide sequence of the DNA for the 

1 0 presence of a sequence which is identical or highly homologous to either a 3' consensus 

splice acceptor site or a 5' consensus splice donor site (Figure 1). Such consensus and near 
consensus sites can be located within any portion of a given DNA (e.g., a gene), including the 
coding region of the DNA and any 3 f and 5' untranslated regions. 

To identify 3' consensus and near consensus splice acceptor sites, a DNA (or 

15 corresponding RNA) sequence is analyzed for the presence of one or more nucleotide 

sequences which includes an AG base pair, and which is either identical to or at least about 
50% homologous, more preferably at least about 60-70% sequence homologous, to the 
sequence: (T/C)>8 N(C/T)AGG. In a preferred embodiment, the nucleotide sequence is also 
followed upstream, typically by about 40 bases, by a nucleotide sequence which is identical 

20 to or highly homologous (e.g., at least about 50%-95% homologous) to a branch consensus 
sequence comprising the following bases: (C/T)N(C/T)T(A/G)A(C/T), wherein N is any 
nucleotide, and A is invariant (i.e., essential). By way of example, in studies described 
herein, consensus and near consensus 3 1 splice sites were selected for correction within a gene 
encoding Factor VIII using the following criteria: the consensus or near consensus site (a) 

25 contained an AG pair, and (b) contained no more than three mismatches to a 3' consensus 
site. 

To identify 5' consensus and near consensus splice donor sites, a DNA (or 
corresponding RNA) sequence can be analyzed for the presence of one or more nucleotide 
sequences which contains a GT base pair, and which is either identical to or at least about 

30 50% homologous, more preferably at least about 60-70% homologous, to the sequence: 
(A/C)AGGT£A/G)AGT. By way of example, in studies described herein, consensus and 
near consensus 5' splice sites were selected for correction within a gene encoding Factor VIII 
using the following criteria: the consensus or near consensus site (a) contained a GT pair, 
and (b) contained no more than four mismatches to a 5' consensus site, provided that if it 

35 contained four mismatches, they were located consecutively in a row. 

Evaluation of DNA or RNA sequences for the presence of one or more consensus or 
near consensus splice sites can be performed in any suitable manner. For example, 
nucleotide sequences can be manually analyzed. Alternatively, a computer algorithm can be 
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employed to search nucleotide sequences for specified base patterns (e.g., the Mac Vector™ 
program). The latter approach is preferred for large DNAs or RNAs, particularly because it 
allows for easy implementation of multiple search parameters. 

5 CORRECTION OF CONSENSUS AND NEAR CONSENSUS SPLICE SITES 

In one embodiment of the invention, splice and branch sequences which are 
consensus, or near consensus, are corrected by substitution of one or more consensus 
nucleotides within the site. The consensus nucleotide within the site is preferably substituted 
with a non-consensus nucleotide. For example, if the nucleotide being substituted is a C (i.e., 

10 a pyrimidine) and the consensus sequence contains either C or T, then the nucleotide is 
preferably substituted by an A or G (i.e., a purine), thereby making the consensus or near 
consensus splice site "farther from consensus." 

In a preferred embodiment of the invention, consensus and near consensus sites which 
are located within a coding region of a gene are corrected by conservative substitution of one 

1 5 or more nucleotides so that the correction does not affect the amino acid sequence coded for. 
Such conservative or "silent" mutation of codons to preserve coding sequences is well known 
in the art. Accordingly, the skilled artisan will be able to select appropriate base substitutions 
to retain the coding sequence of any codon which forms all or part of a consensus or near 
consensus splice site. For example, as shown in Figure 2, if a 3' near consensus splice site 

20 contains a TCA codon encoding serine, and the A is a consensus nucleotide (e.g., part of the 
essential AG pair, then this nucleotide can be substituted with a C, G, or a T to correct the 3' 
near consensus splice site (e.g., making it no longer near consensus because it does not 
contain the essential AG pair required for a 3' near consensus splice site), without affecting 
the coding sequence of the codon. 

25 Accordingly, in a preferred embodiment of the invention, correction of consensus or 

near consensus splice sites which are specifically located within the coding region of a gene 
is achieved by substitution of one or both bases of an essential AG or GT pair within the 
consensus or near consensus splice site, with a base which does not alter the coding sequence 
of the site. Correction of consensus or near consensus branch sequences is similarly achieved 

30 by substitution of the essential A within the consensus or near consensus branch site, with a 
base which does not alter the coding sequence of the site. By correcting any of these 
essential bases, the splice or branch site will no longer be consensus or near consensus. 

In another preferred embodiment, correction of consensus or near consensus splice 
sites which are specifically located within the coding region of a gene is achieved by making 

35 all possible conservative mutations to consensus nucleotides within the site, so that the 

consensus or near consensus splice site is as far from consensus as possible but encodes the 
same amino acid sequence. 
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Other preferred corrections of the invention include corrections of 3' consensus and 
near consensus splice sites which are followed downstream (e.g., by approximately 50-350 
nucleotides) by a consensus or near consensus 5 ! splice donor site. Other preferred 
corrections of the invention include corrections of 5' consensus and near consensus splice 
5 sites which are preceded upstream (e.g., by about 50-350 nucleotides) by a consensus or near 
consensus 3' splice acceptor site. 

For consensus or near consensus splice sites which are located outside the coding 
region of a gene, for example, in a 3' or 5' untranslated region (UTR), alternative approaches 
to correction can also be employed. For instance, because preservation of the coding 

10 sequence is not a consideration, the near consensus splice site can be corrected not only by 
any base substitution, but also by addition or deletion of one or more bases within the 
consensus or near consensus splice site, making the site farther from consensus. 

Techniques for making nucleotide base substitutions, additions and deletions as 
described above are well known in the art. For example, standard point mutation may be 

1 5 employed to substitute one or more bases within a near consensus splice site with a different 
(e.g., non-consensus) base. Alternatively, as described in detail in the examples below, entire 
genes or portions thereof can be reconstructed (e.g., resynthesized using PCR), to correct 
multiple consensus and near consensus splice sites within a particular region of a gene. This 
approach is particularly advantageous if a gene contains a high concentration of consensus 

20 and/or near consensus splice sites within a given region. 

In a specific embodiment, the invention features a novel Factor VIII gene containing 
one or more consensus or near consensus splice sites which have been corrected by 
substitution of one or more consensus nucleotides within the site. As part of the present 
invention, the coding region of a gene (cDNA) encoding human p-domain deleted Factor 

25 VIII protein (nucleotides 1006-5379 of SEQ ID NO:2) was evaluated as described herein and 
found to contain 23 near consensus 5 f splice (donor) sequences, 22 near consensus 3' splice 
(acceptor) sequences, and 18 consensus branch sequences (shown in Figure 3). A new 
coding sequence (SEQ ID NO:l) was then developed for this gene to correct all 3* and 5' near 
consensus splice sites by conservative mutation. In total, 99 point mutations were made to 

30 the coding region. The location of each of these point mutations is shown in Figure 3. The 
specific base substitution made in each of these point mutations is shown in Figure 4(A-C). 

A comparison of this new coding sequence (SEQ ID NO:l) and the original 
uncorrected sequence (nucleotides 1006-5379 of SEQ ID NO:2), also showing the positions 
and specific substitutions made in each of the ninety-nine point mutations, is shown in Figure 

35 5(A-0). A plasmid vector, referred to as pDJC, containing the new (i.e., corrected) Factor 
VIII gene coding sequence, including restriction sites used to synthesize the gene and 
regulatory elements used to express the gene, is shown in Figure 6. A plasmid vector, 
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referred to as pCY2, containing the original, uncorrected Factor VIII gene, including 
restriction sites and regulatory elements used to express the gene, is shown in Figure 7. 

As described in further detail in the examples below, all 99 consensus base 
corrections within the coding region of pD JC can be made by synthesizing overlapping 
5 oligonucleotides (based on the sequence of pCY2 shown in SEQ ID NO:2) which contain the 
desired corrections. A schematic illustration of this process is shown in Figures 8. In total, 
185 overlapping 60-mer oligonucleotides can be synthesized, and assembled in five segments 
using the method of Stemmer et al. (1995) Gene 164: 49-53. Prior to assembly, each segment 
can be sequenced and tested in in vitro transfection assays (e.g., nuclear and cytoplasmic 

1 0 RN A analysis) in pC Y2. 

As an alternative to the "correct all" approach described above, selective correction of 
consensus and near consensus splice sites can also be employed. This involves selecting only 
(a) consensus sites, and near consensus splice sites which are close to consensus, and/or (b) 
consensus sites and near consensus sites which are located at positions which render these 

15 sites more likely to function as a splice donor or acceptor site. To select only nucleotide 
sequences which are complete consensus or which are close to consensus, evaluation of a 
given nucleotide sequence is limited to analyzing the nucleotide sequence for sequences 
which are identical to or are highly homologous (e.g., greater than 70-80% homologous) to a 
3' or 5* consensus splice site. To select only nucleotide sequences which are located at 

20 positions which render these sites more likely to function as a splice donor or acceptor site, 
the location of each 3' consensus or near consensus splice site must be evaluated with respect 
to the position of any neighboring 5 f consensus or near consensus splice sites. If a 3 r 
consensus or near consensus splice site is located approximately 50-350 bases upstream from 
a 5' consensus or near consensus splice site, then these 3' and 5* splice sites are likely to 

25 function as a splice acceptor and donor sites. Therefore, these sites are preferably, and 
selectively, removed. 

By way of example, particular consensus and/or near consensus 5' splice donor and 3* 
splice acceptor sites, as shown in Figure 3, can be selected within the coding region of the 
cDNA encoding human p-domain deleted Factor VIII (nucleotides 1006-5379 of SEQ ID 

30 NO:2) for preferred correction, based on their relative locations (i.e., 3' splice acceptor site 
located approximately 50-350 bases upstream from 5 r near consensus splice site). Such 
preferred selective corrections can include, for instance, the near consensus 3 r splice acceptor 
site spanning nucleotide base 1851 of the coding region (see Figure 3) and any of the near 
consensus 5 r splice donor sites located within 50-350 bases downstream of this near 

35 consensus 3 ! splice acceptor site, such as those spanning positions 1956, 1959, 21 15, 2178 
and 2184. 

Splice site correction as provided herein can be applied to any gene known in the art. 
For example, the complete nucleotide sequence of other (e.g., full-length and P-domain 
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deleted) Factor VIII genes (both genomic clones and cDNAs) are described in US Patent No. 
4,757,006, US Patent No. 5,618,789, US Patent No. 5,683,905, and US Patent No. 4,868,1 12, 
the disclosures of which are incorporated by reference herein. The nucleotide sequences of 
these genes can be analyzed for consensus and near consensus splice sites, and thereafter 
5 corrected, using the guidelines and procedures provided herein. 

In addition, other genes, particularly large genes containing several introns and exons, 
are also suitable candidates for splice site correction. Such genes, include, for example, the 
gene encoding Factor IX, or the cystic fibrosis transmembrane regulator (CFTR) gene 
described in US Patent No. 5,240,846, or nucleic acids encoding CFTR monomers, as 
10 described in US Patent No. 5,639,661. The disclosures of both of these patents are 
accordingly incorporated by reference herein. 

ADDITION OF INTRONS 

In another embodiment, a novel gene of the invention includes one or more non- 
15 naturally occurring introns which have been added to the gene to increase expression of the 
gene, or to alter the splicing pattern of the gene. The present invention provides the first 
known instance of gene engineering which involved adding a non-naturally-occurring intron 
within the coding sequence of a gene, particularly without affecting the activity of the protein 
encoded by the gene. The benefit of intron addition in this context is at least two-fold. First, 
20 as shown in Figure 14 in the context of the human Factor VIII gene, addition of one or more 
introns into a gene increases the expression of the gene compared to the same gene without 
the intron. Second, the intron, when placed within the coding sequence of the gene, can be 
used to beneficially alter the splicing pattern of the gene (e.g., so that a particular protein of 
interest is expressed), and/or to increase cytoplasmic accumulation of mRNA transcribed 
25 from the gene. 

Novel genes of the present invention may also contain introns outside of the coding 
region of the gene. For example, introns may be added to the 3* or 5 ! non-coding regions of 
the gene (utranslated regions (UTRs)). In a preferred embodiment of the invention, an intron 
is added upstream of the gene in the 5' UTR, as shown in pDJC (Figure 6) and pCY2 (Figure 

30 7). Such introns may include newly engineered introns or pre-existing introns. In a preferred 
embodiment of the invention, the intron is derived from the rabbit p-globin intron (IVS). 

In a particular embodiment, the invention provides a novel human Factor VIII gene 
which includes within its coding region one or more introns. If the gene comprises the 
coding region of a full-length human Factor VIII gene, then at least one of these introns 

35 preferably spans (i.e., overlaps, encompasses or is encompassed by) the portion of the gene 
encoding the p-domain. This portion of the gene is then spliced out during transcription of 
the gene, so that the gene is expressed as a P-domain deleted protein (i.e., a Factor VIII 
protein lacking all or a portion of the P-domain). 
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A p-domain deleted human Factor VIII protein possesses known advantages over a 
full-length human Factor VIII protein (also known as human Factor VIII:C), including 
reduced immunogenicity (Toole et al. (1986) PNAS 83: 5939-5942). Moreover, it is well 
known that the p-domain is not needed for activity of the Factor VIII protein. Thus, a novel 
5 Factor VIII gene of the invention provides the dual benefit of (1) increased and (2) preferred 
protein expression. 

Addition of one or more introns into a gene can be achieved by adding a 5' splice 
donor site and a 3' splice acceptor site (Figure 1) into the nucleotide sequence of the gene at a 
desired location. If the intron is being added to remove a portion of the coding sequence 

10 from the gene, then a 5' splice donor site is placed at the 5' end of the portion being removed 
(i.e., defined by the intron) and a 3' splice acceptor site is placed at the 3' end of the portion to 
be removed. Preferably, the 5' splice donor and 3 ? splice acceptor sequences are consensus, 
including the branch sequence located upstream of the 3' splice site, so that they will be 
favored (and more likely bound) by cellular splicing machinery over any surrounding near 

15 consensus splice sites. 

As shown in Figure 1, splicing will occur 5' of the essential GT base pair within the 5' 
splice donor site, and 3 f of the essential AG base pair within the 3 f splice acceptor site. Thus, 
for introns added to coding sequences of genes, the intron is preferably designed to that, upon 
splicing, the coding sequence is unaffected. This can be done by designing and adding 5 r 

20 splice donor and 3* splice acceptor sites which include only conservative (i.e., silent) changes 
to the nucleotide sequence of the gene, so that addition of these splice sites does not alter the 
coding sequence. 

For example, as part of the present invention, an intron was engineered into the 
coding sequence of a full-length cDNA encoding human Factor VIII (1006-8061 of SEQ ID 

25 NO:4). The intron spanned the portion of the gene encoding the p-domain (nucleotides 2290- 
5147 of SEQ ID NO:4, encoding amino acid residues 745-1638). As described in the 
examples below, this intron was created by adding a 5* splice donor site (100% consensus) so 
that splicing would occur immediately 5' of the coding sequence of the p-domain. A 3' splice 
acceptor site was also added so that splicing would occur immediately 3' of the coding 

30 sequence of the P-domain. Figure 1 1 shows the nucleotide sequences (SEQ ID NO:5) of the 
precise boundaries of the resulting intron that was added. 

The nucleotide sequence for the 5' splice donor site of the added intron was derived 
from the pre-existing splice donor sequence found at the 5' end of IVS (Intron) 13 of genomic 
Factor VIII. This intron precedes exon 14, the exon which contains the sequence coding for 

35 the p-domain. The inserted sequence also contained the first nine bases of IVS 13 following 
the splice donor sequence. 

The sequence for the 3' splice acceptor site was derived from the pre-existing splice 
acceptor sequence found at the 3' end of IVS 14 of genomic Factor VIII. This intron follows 



ATTORNEY DOCKET NO: TTI-180 

- 17- 

exon 14, the p-domain-containing exon. The inserted 3' splice acceptor site also contained 
130 bases upstream of the splice acceptor in IVS 14. This upstream region contains at least 
two near-consensus branch sequences. 

Thus, both the 3' and 5' engineered splice sites were designed to take advantage of 
5 pre-existing nucleotide sequences within the p-domain region of the human Factor VIII gene. 

The 5* splice donor, 3' splice acceptor, and branch sequences of the added intron were 
further modified so that they were 100% consensus (i.e., congruent to their respective 
consensus splicing sequences). Modifications (e.g., base substitutions) were chosen so as to 
not alter the coding sequence of bases located upstream of the 5' splice site and downstream 

10 of the 3' splice site (i.e., flanking the boundaries of the intron). A map showing the various 
domains of the full-length Factor VIII gene, along with the 5* splice donor and 3 f splice 
acceptor sites inserted into the gene, is shown in Figure 10. The complete nucleotide 
sequences of the intron boundaries (i.e., 5' splice donor and 3' splice acceptor) are shown in 
Figure 1 1 (SEQ ID NO:5). A map showing the location of the location of the 5' splice donor 

1 5 and 3' splice acceptor sites with respect to various restriction sites (used to clone in the sites) 
is shown in Figure 12. As shown schematically in Figure 13, the resulting novel Factor VIII 
gene, in contrast to a full-length Factor VIII gene or a gene encoding P-domain deleted Factor 
VIII, is transcribed as a pre-mRNA which contains the region encoding the p-domain, but is 
then spliced to remove the majority of this region, so that the resulting mRNA is expressed as 

20 a p-domain deleted protein. A complete expression plasmid (pLZ-6) containing the coding 
sequence of this novel Factor VIII gene, as well as an engineered 5 ! untranslated region 
containing regulatory elements designed to provide high, liver-specific expression, comprises 
the nucleotide sequence shown in SEQ ID NO:3. Bases 1006-8237 of pLZ-6 (SEQ ID NO:3) 
correspond to the coding region of the novel Factor VIII gene. 

25 Accordingly, in a preferred embodiment, the invention provides a novel Factor VIII 

gene comprising a non-naturally occurring intron spanning all or a portion of the P-domain 
region of the gene. In one embodiment, the gene comprises the coding region of the 
nucleotide sequence shown in SEQ ID NO:3. The gene may also contain further 
modifications, such as additional introns, or one or more corrected consensus or near 

30 consensus splice sites as described herein. In particular, the gene may further comprise one 
or more introns upstream of the coding sequence of the gene, within the 5* UTR. As shown 
in Figures 6 and 7, a preferred intron for insertion within this region is the rabbit P-globin 
intron (IVS). In addition, consensus and near consensus splice site corrections can be made 
to the gene, such as those shown in Figures 3 and 4(A-C). 
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OPTIMIZATION OF 5' AND 3' UNTRANSLATED REGIONS FOR 
HIGH TISSUE-SPECIFIC GENE EXPRESSION 

Novel DNAs of the invention are preferably in a form suitable for transcription and/or 
expression by a cell. Generally, the DNA is contained in an appropriate vector (e.g., an 
5 expression vector), such as a plasmid, and is operably linked to appropriate genetic regulatory 
elements which are functional in the cell. Such regulatory sequences include, for example, 
enhancer and promoter sequences which drive transcription of the gene. The gene may also 
include appropriate signal and polyadenylation sequences which provide for trafficking of the 
encoded protein to intracellular destinations or export of the mRNA. The signal sequence 

10 may be a natural sequence of the protein or an exogenous sequence. 

Suitable DNA vectors are known in the art and include, for example, DNA plasmids 
and transposable genetic elements containing the aforementioned genetic regulatory and 
processing sequences. Particular expression vectors which can be used in the invention 
include, but are not limited to, pUC vectors (e.g., pUC19) (University of California, San 

15 Francisco) pBR322, and pcDNAl (InVitrogen, Inc.). An expression plasmid, pMT2LA8, 
encoding a p-domain deleted Factor VIII protein is described, for example, by Pitman et al. 
(1993) Blood 81(1 1):2925-2935). Entire coding sequences for these plasmid vectors are also 
provided herein (SEQ ID NOS: 4 and 2, respectively). 

Suitable regulatory sequences required for gene transcription, translation, processing 

20 and secretion are art-recognized, and are selected to direct expression of the desired protein in 
an appropriate cell. Accordingly, the term "regulatory sequence", as used herein, includes 
any genetic element present 5 T (upstream) or 3' (downstream) of the translated region of a 
gene and which control or affect expression of the gene, such as enhancer and promoter 
sequences (e.g., viral promoters, such as SV40 and CMV promoters). Such regulatory 

25 sequences are discussed, for example, in Goeddel, Gene expression Technology: Methods in 
Enzymology, page 185, Academic Press, San Diego, CA (1990), and can be selected by those 
of ordinary skill in the art for use in the present invention. 

In a preferred embodiment of the invention, the 5* and/or 3' untranslated regions 
(UTRs) of a gene construct (e.g., a novel DNA of the invention) are optimized to provide 

30 high, tissue-specific expression. Such optimization can include, for example, selection of 
optimal tissue-specific promoters and enhancers, multerimization of genetic elements, 
insertion of one or more introns within or outside of the coding sequence, correction of near- 
consensus 5 ? splice donor and 3' splice acceptor sites within or outside of the coding 
sequence, optimization of transcription initiation and termination sites, insertion of RNA 

35 export elements, and addition of polyadenylation trimer cassettes to insulate transription. In 
preferred embodiments of the invention, a combination of the aforementioned elements and 
sequence modifications are selected and engineered into the gene construct to provide 
optimized expression. 
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For many applications of human gene therapy, it is desirable to express proteins in the 
liver, which has the highest rate of protein synthesis per gram of tissue. For example, 
effective gene therapy for human Factor VIII requires sufficient levels and duration of protein 
expression in hepatocytes where Factor VIII is naturally produced, and/or in endothelial cells 
5 (ECs) where von Willebrand factor is produced, a protein which stabilizes the secretion of 
Factor VIII. Thus, in one embodiment, the invention provides a gene construct (e.g., 
expression vector) optimized to produce high levels and duration of liver-specific protein 
expression. In a particular embodiment, the invention provides a human Factor VIII gene 
construct, optimized to produce high levels and duration of liver-specific or endothelium- 

10 specific protein expression. This is achieved, for example, by selecting optimal liver-specific 
and endothelium-specific promoters and enhancers, and by combining these tissue-specific 
elements with other genetic elements and modifications to increase gene transcription. 

Accordingly, for high levels and duration of gene expression in the liver, suitable 
promoters include, for example, promoters known to contain liver-specific elements. In one 

1 5 embodiment, the invention employs the thyroid binding globulin (TBG) promoter described 
by Hayashi et al. (1993) Molec. Endocrinol 7:1049-1060. As shown in Figure 21, the TBG 
promoter contains hepatic nuclear factor (HNF) enhancer elements and provides the 
additional advantage of having a precisely mapped transcriptional start site. This allows 
insertion of a leader sequence, preferably optimized as described herein, between the 

20 promoter and the transcriptional start site. Figure 21 also shows the complete nucleotide 
sequence of the TBG promoter (SEQ ID NO: 10). 

For high levels and duration of gene expression in endothelium, suitable endothelium- 
specific promoters include, for example, the human endothelin-1 (ET-1) gene promoter 
described by Lee et al. (1990) J. Biol Chem. 265(18), the fms-like tyrosine kinase promoter 

25 (Flt-1) described by Morishita et al (1995) J. Biol Chem. 270(46), the Tie-2 promoter 

described by Korhonen et al. (1995) Blood 86(5): 1828-1 835, and the nitric oxide synthase 
promoter described by Zhang et al. (1995) J. Biol Chem. 270(25)) (see Figure 24). 

Promoters selected for use in the invention are preferably paired with a suitable 
ubiquitous or tissue-specific enhancer designed to augment transcription levels. For example, 

30 in one embodiment, a liver-specific promoter, such as the TBG promoter, is used in 
conjunction with a liver-specific enhancer. In a preferred embodiment, the invention 
employs one or more copies of the liver-specific alpha- 1 microglobulin/bikunin (ABP) 
enhancer described by Rouet et al. (1992) J. Biol Chem. 267:20765-20773, in combination 
with the TBG promoter. As shown in Figure 20, the ABP enhancer contains a cluster of HNF 

35 enhancer elements common to many liver-specific genes within a short nucleotide sequence, 
making it suitable to multerimize. When multerimized, the ABP enhancer generally exhibits 
increased activity and functions in either orientation within a gene construct. 
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Thus, in one embodiment, the invention provides an expression vector or DNA 
construct comprising one or more copies of a liver-specific or endothelium-specific promoter 
and a liver-specific or endothelium-specific enhancer, the promoter and enhancer being 
derived from different genes, such as thyroid binding globulin gene and the alpha- 1 
5 microglobulin/bikunin gene. 

Alternatively, strong ubiquitous (i.e., non-tissue specific) enhancers can be used in 
conjunction with tissue-specific promoters, such as the TBG promoter or the ET-1 promoter, 
to achieve high levels and duration of tissue-specific expression. Such ubiquitous enhancers 
include, for example, the human c-fos (SRE) gene enhancer described by Treisman et al. 

10 (1986) Cell 46 which, when used in combination with liver-specific promoters (e.g., TBG) or 
endothelium-specific promoters (e.g., ET-1), provide high levels of tissue-specific 
expression, as demonstrated in studies described herein. 

Accordingly, in a particular embodiment, the invention provides a gene construct 
which is optimized for specific expression in liver cells by inserting within its 5 f untranslated 

1 5 region one or more copies of the ABP enhancer (preferably two copies) coupled upstream 
with the TBG promoter, as shown in Figure 15. Specific gene constructs, such as pCY2 and 
pDJC, containing these elements inserted upstream of the coding region for human Factor 
VIII (p-domain deleted and full-length with intron spanning the P-domain), are shown in 
Figures 6 and 7, respectively. In another particular embodiment, the gene construct is 

20 optimized for specific expression in endothelial cells by inserting within its 5' region one or 
more copies of the c-fos SRE enhancer, or an endothelial-specific enhancer (e.g., the human 
tissue factor (hTF/m) enhancer described by Parry et al. (1995) Arterioscler. Thromb. Vase. 
Biol. 15:612-621) coupled upstream with the ET-1 promoter. 

In addition to selecting optimal promoters and enhancers, optimization of a gene 

25 construct can include the use of other genetic elements within the transcriptional unit of the 
gene to increase and/or prolong expression. In one embodiment, one or more introns (e.g., 
non-naturally occurring introns) are inserted into the 5 ' or 3 ' untranslated region (UTR) of the 
gene. Introns from a broad variety of known genes (e.g., mammalian genes) can be used for 
this purpose. In one embodiment, the invention employs the first intron (IVS) from the rabbit 

30 p-globin gene comprising the nucleotide sequence shown in Figure 23 (SEQ ID NO:6). 

In cases where the intron does not contain consensus 5' splice donor and 3* splice 
acceptor sites, or a consensus branch and pyrimidine track sequence, the intron is preferably 
optimized (modified) to render these sites completely consensus. This can be achieved, for 
example, by substituting one or more nucleotides within the 5* or 3 r splice site, as previously 

35 described herein to render the site consensus. For example, when using the rabbit p-globin 
intron, the nucleotide sequence can be modified as shown in Figure 16 to render the 5 f splice 
donor and 3' splice acceptor sites, and the pyrimidine track, entirely consensus. This can 
facilitate efficient transcription and export of the gene message out of the cell nucleus, 
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thereby increasing expression. Exemplary nucleotide substitutions within the rabbit p-globin 
IVS which can be made to achieve this result are shown in Figure 23 which shows a 
comparison of the sequence for the unmodified (wild-type) rabbit p-globin intron (SEQ ID 
NO:6) and the same sequence modified to render the 5 f splice donor and 3 1 splice acceptor 
5 sites, and the pyrimidine track, entirely consensus (SEQ ID NO:7). 

When engineering one or more introns into the 5' UTR of a gene construct, the intron 
can be inserted into the leader sequence of the gene, as shown in Figures 15,16 and 22. 
Accordingly, the intron can be inserted within the leader sequence, downstream from the 
promoter and enhancer elements. This can be done in conjunction with one or more 

10 additional modifications to the leader sequence, all of which serve to increase transcription, 
stability and export of mRNAs. Such additional modifications include, for example, 
optimizing the translation initiation site (Kozak et al. (1986) Cell 44:283) and/or the 
secondary structure of the leader sequence (Kozak et al. (1994) Molec. Biol. 235 :95). 

Accordingly, in a preferred embodiment, the invention provides a gene construct 

1 5 which contains within its transcriptional unit, one or a combination of the foregoing genetic 
elements and sequence modifications designed to provide high levels and duration of gene 
expression, optionally in a tissue-specific manner. In a particular embodiment, the construct 
contains a gene encoding human Factor VIII (e.g., p-domain deleted or full-length), having a 
5' untranslated region which is optimized to provide significant levels and duration of liver- 

20 specific or endothelium-specific expression. 

Particularly preferred gene constructs of the invention include, for example, those 
comprising the nucleotide sequences shown in SEQ ID NO:2 and SEQ ID NO:4, referred to 
herein respectively as pCY-2 and pLZ-6. These constructs contain the coding sequences for 
human p-domain deleted Factor VIII (pCY-2) and full-length human Factor VIII (containing 

25 an intron spanning the P-domain) (pLZ-6) downstream from an optimized 5' UTR designed 
to provide high levels and duration of human Factor VIII expression in liver cells. Other 
preferred gene constructs comprise the identical 5 ? UTR of pCY-2 and pLZ-6, in conjunction 
with coding sequences for other proteins desired to be expressed in the liver (e.g., other blood 
coagulation factors, such as human Factor IX). 

30 As shown in Figures 7, 15 and 16, plasmids pCY-2 and pLZ-6 contain 5' UTRs 

comprising a novel combination of regulatory elements and sequence modifications shown 
herein to provide high levels and duration of human Factor VIII expression, both in vitro and 
in v/vo, in liver cells. Specifically, each construct comprises within its 5 f UTR sequentially 
from 5' to 3' (a) two copies of the ABP enhancer (SEQ ID NO:9), (b) one copy of the TBG 

35 promoter (SEQ ID NO: 10), and (c) an optimized 71 nucleotide leader sequence (SEQ ID NO: 
11) split by intron 1 of the rabbit p-globin gene. The intron is optimized to contain consensus 
splice acceptor, donor and pyrimidine track sites. 
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The leader sequence within the 5' UTR of pCY-2 and pLZ-6 also contains an 
optimized translation initiation site (SEQ ID NO: 8). Specifically, the human Factor VIII gene 
contains a cytosine at the +4 position, following the AUG start codon. This base was changed 
to a guanine, resulting in an amino acid change within the signal sequence of the protein from 
5 a glutamine to a glutamic acid. The leader sequence was further designed to have no RNA 
secondary structure, as predetermined by an RNA-folding algorithm (Figure 16) (Kozak et al. 
(1994) J. Mol Biol 235:95). 

In addition to optimization of the 5' UTR of a gene construct, the 3' UTR can also be 
engineered to include one or more genetic elements or sequence modifications which increase 

1 0 and/or prolong expression of the gene. For example, the 3' UTR can be modified to provide 
optimal RNA processing, export and mRNA stability. In one embodiment of the invention, 
this is done by increasing translational termination efficiency. In mammalian RNA's, 
translational termination is generally optimal if the base following the stop codon is a purine 
(McCaughan et al (1995) PNAS 92:543 1). In the case of the human Factor VIII gene, the 

1 5 UGA stop codon is followed by a guanine and is thus already optimal. However, in other 
gene constructs of the invention which do not naturally contain an optimized translational 
termination sequence, the termination sequence can be optimized using, for example, site 
directed mutagenesis, to substitute the base following the stop codon for a purine. 

In particular gene constructs of the invention which contain the human Factor VIII 

20 gene, the 3 1 UTR can further be modified to remove one or more of the three pentamer 

sequences AUUUA present in the 3' UTR of the gene. This can increase the stability of the 
message. Alternatively, the 3* UTR of the human Factor VIII gene, or any gene having a 
short-lived messenger RNA, can be switched with the 3' UTR of a gene associated with a 
message having a longer lifespan. 

25 Additional modifications for optimizing gene constructs of the invention include 

insertion of one or more poly A trimer cassettes for optimal polyadenylation and 3* end 
formation. These can be inserted within the 5' UTR or the 3' UTR of the gene. In a preferred 
embodiment, the gene construct is flanked on either side by a poly A trimer cassette, as shown 
in Figure 15. These cassettes can inhibit transcription originating outside of the desired 

30 promoter in the transcriptional unit, ensuring that transcription of the gene occurs only in the 
tissue where the promoter is active (Maxwell et al. (1989) Biotechniques 1989 3:276). 
Additionally, because the poly A trimer cassette functions in both orientations, i.e., on each 
DNA strand, it can be utilized at the 3* end of the gene for transcriptional termination and 
polyadenylation, as well as to inhibit bottom strand transcription and production of antisense 

35 RNA. 

In further embodiments of the invention, gene optimization includes the addition of 
viral elements for accessing non-splicing RNA export pathways. The majority of mRNAs in 
higher eukaryotes contain intronic sequences which are removed within the nucleus, followed 
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by export of the mRNA into the cytoplasm. This is referred to as the splicing pathway. 
However, as shown in Figure 17, mammalian intronless genes, hepadnaviruses (e.g., HBV), 
and many retroviruses access a nonsplicing pathway which is facilitated by cellular RNA 
export proteins and/or specific sequences within. This is referred to as the facilitated pathway. 
5 In a particular embodiment, the gene construct is modified to include one or more 

copies of the post-transcriptional regulatory element (PRE) from hepatitis B virus. This 587 
base pair element and its function to facilitate export of mRNAs from the nucleus, is 
described in U.S. Patent No. 5,744,326. Generally, the PRE element is placed within the 3' 
UTR of the gene, and can be inserted as two or more copies to further increase expression, as 

10 shown in Figure 18 (plasmid pCY-401 verses plasmid pCY-402). 

Gene constructs (e.g., expression vectors) of the invention can still further include 
sequence elements which impart both an autonomous replication activity (i.e., so that when 
the cell replicates, the plasmid replicates as well) and nuclear retention as an episome. 
Generally, these sequence elements are included outside of the transcriptional unit of the gene 

15 construct. Suitable sequences include those functional in mammalian cells, such as the oriP 
sequence and EBNA-1 gene from the Epstein-Barr virus (Yates et al. (1985) Nature 
313:812). Other suitable sequences include the E. coli origen of replication, as shown in 
Figures 6 and 7. 

Gene constructs of the invention, such as pDJC, pCY-2, pCY-6, pLZ-6 and pCY2- 
20 SE5, have been described above, but are not intended to be limiting. Other novel constructs 
can be made in accordance with the guidelines provided herein, and are intended to be 
included within the scope of the present invention. 

INCREASED CYTOPLASMIC RNA ACCUMULATION AND EXPRESSION 
25 Novel DNAs (e.g., genes) of the present invention are modified to increase 

expression, for example, by facilitate cytoplasmic accumulation of mRNA transcribed from 
the DNA and by optimizing the 5' and 3 f untranslated regions of the DNA. Accordingly, 
cytoplasmic mRNA accumulation and/or expression of the DNA is increased relative to the 
same DNA in unmodified form. 
30 To evaluate (e.g., quantify) levels of nuclear or cytoplasmic mRNA accumulation 

obtained following transcription of novel DNAs and vectors of the invention, a variety of art 
recognized techniques can be employed, such as those described in Sambrook et al. 
"Molecular Cloning," 2d ed., and in the examples below. Such techniques include, for 
instance, Northern blot analysis, using total nuclear or cytoplasmic RNA. This assay can, 
35 optionally, be normalized using mRNA transcribed from a control gene, such as a gene 
encoding glyceraldehyde phosphate dehydrogenase (GAPDH). Levels of nuclear and 
cytoplasmic RNA accumulation can then be compared for novel DNAs of the invention to 
determine whether an increase has occurred following correction of one or more consensus or 
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near consensus splice sites, and/or by addition of one or more non-naturally occurring introns 
into the DNA. 

Novel DNAs of the invention can also be assayed for altered splicing patterns using 
similar techniques. For example, as described in the examples below, to determine whether a 
5 non-naturally occurring intron has been successfully incorporated into a DNA so that it is 
correctly spliced during mRNA processing, cytoplasmic mRNA can be assayed by Northern 
blot analysis, reverse transcriptase PCR (RT-PCR), or RNase protection assays. Such assays 
are used to determine the size of the mRNA produced from the novel DNA 
containing the non-naturally occurring intron. The size of the mRNA can then be compared 
1 0 to the size of the DNA with and without the intron to determine whether splicing has been 
achieved, and whether the splicing pattern corresponds to that expected based on the size of 
the added intron. 

Alternatively, protein expressed from cytoplasmic RNA can be assayed by SDS- 
P AGE analysis and sequenced to confirm that correct splicing has been achieved. 

15 To measure expression levels, novel DNAs of the invention can also be tested in a 

variety of art-recognized expression assays. Suitable expression assays, as illustrated in the 
examples provided below, include quantitative ELISA (Zatloukal et al. (1994) PNAS 9k 
5148-5152), radioimmunoassay (RIA), and enzyme activity assays. When expression of 
Factor VIII protein is being measured, in particular, Factor VIII activity assays such as the 

20 KabiCoATest, (Kabi Inc., Sweden) can be employed to quantify expression. 

GENE DELIVERY TO CELLS 

Following insertion into an appropriate vector, novel DNAs of the invention can be 
delivered to cells either in vitro or in vivo. For example, the DNA can be transfected into 
25 cells in vitro using standard transfection techniques, such as calcium phosphate precipitation 
(O'Mahoney et al. (1994) DNA & Cell Biol 13(12): 1227-1232). Alternatively, the gene can 
be delivered to cells in vivo by, for example, intravenous or intramuscular injection. 

In one embodiment of the invention, the gene is targeted for delivery to a specific cell 
by linking the plasmid to a carrier molecule containing a ligand which binds to a component 
30 on the surface of a cell, thereby forming a polynucleotide-carrier complex. The carrier can 
further comprise a nucleic acid binding agent which noncovalently mediates linkage of the 
DNA to the ligand of the carrier molecule. 

The carrier molecule of the polynucleotide-carrier complex performs at least two 
functions: (1) it binds the polynucleotide (e.g., the plasmid) in a manner which is 
35 sufficiently stable (either in vivo, ex vivo, or in vitro) to prevent significant uncoupling of 
the polynucleotide extracellularly prior to internalization by a target cell, and (2) it binds 
to a component on the surface of a target cell so that the polynucleotide-carrier complex is 
internalized by the cell. Generally, the carrier is made up of a cell-specific ligand and a 
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cationic moiety which, for example are conjugated. The cell-specific ligand binds to a cell 
surface component, such as a protein, polypeptide, carbohydrate, lipid or combination 
thereof. It typically binds to a cell surface receptor. The cationic moiety binds, e.g., 
electrostatically, to the polynucleotide. 
5 The ligand of the carrier molecule can be any natural or synthetic ligand which 

binds a cell surface receptor. The ligand can be a protein, polypeptide, glycoprotein, 
glycopeptide, glycolipid or synthetic carbohydrate which has functional groups that are 
exposed sufficiently to be recognized by the cell surface component. It can also be a 
component of a biological organism such as a virus, cells (e.g., mammalian, bacterial, 
10 protozoan). 

Alternatively, the ligand can comprise an antibody, antibody fragment (e.g., an 
F(ab f ) 2 fragment) or analogues thereof (e.g., single chain antibodies) which binds the cell 
surface component (see e.g., Chen et al. (1994) FEBS Letters 338:167-169, Ferkol et al. 
(1993) J. Clin. Invest 92:2394-2400, and Rojanasakul et al. (1994) Pharmaceutical Res. 

15 11(1 2): 1 73 1 -1 736). Such antibodies can be produced by standard procedures. 

Ligands useful in forming the carrier will vary according to the particular cell to be 
targeted. For targeting hepatocytes, proteins, polypeptides and synthetic compounds 
containing galactose-terminal carbohydrates, such as carbohydrate trees obtained from 
natural glycoproteins or chemically synthesized, can be used. For example, natural 

20 glycoproteins that either contain terminal galactose residues or can be enzymatically treated 
to expose terminal galactose residues (e.g., by chemical or enzymatic desialylation) can be 
used. In one embodiment, the ligand is an asialoglycoprotein, such as asialoorosomucoid, 
asialofetuin or desialylated vesicular stomatitis virus. In another embodiment, the ligand is a 
tri- or tetra-antennary carbohydrate moiety. 

25 Alternatively, suitable ligands for targeting hepatocytes can be prepared by 

chemically coupling galactose-terminal carbohydrates (e.g., galactose, mannose, lactose, 
arabinogalactan etc.) to nongalactose-bearing proteins or polypeptides (e.g., polycations) by, 
for example, reductive lactosamination. Methods of forming a broad variety of other 
synthetic glycoproteins having exposed terminal galactose residues, all of which can be used 

30 to target hepatocytes, are described, for example, by Chen et al. (1994) Human Gene Therapy 
5:429-435 and Ferkol et al. (1993) FASEB 7: 1081-1091 (galactosylation of polycationic 
histones and albumins using EDC); Perales et al. (1994) PNAS 91:4086-4090 and Midoux et 
al. (1993) Nucleic Acids Research 21(4):871-878 (lactosylation and galactosylation of 
polylysine using a-D-galactopyranosyl phenylisothiocyanate and 4-isothiocyanatophenyl P~ 

3 5 D-lactoside); Martinez-Fong ( 1 994) Hepatology 20(6): 1 602- 1 608 (lactosylation of polylysine 
using sodium cyanoborohydride and preparation of asialofetuin-polylysine conjugates using 
SPDP); and Plank et al. (1992) Bioconjugate Chem. 3:533-539 (reductive coupling of four 
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terminal galactose residues to a synthetic carrier peptide, followed by linking the carrier to 
polylysine using SPDP). 

For targeting the polynucleotide-carrier complex to other cell surface receptors, the 
carrier component of the complex can comprise other types of ligands. For example, 
5 mannose can be used to target macrophages (lymphoma) and Kupffer cells, mannose 6- 
phosphate glycoproteins can be used to target fibroblasts (fibro- sarcoma), intrinsic factor- 
vitamin B 1 2 and bile acids (See Kramer et aL ( 1 992) J. Biol. Chem. 267: 1 8598- 1 8604) 
can be used to target enterocytes, insulin can be used to target fat cells and muscle cells 
(see e.g., Rosenkranz et al. (1992) Experimental Cell Research 199:323-329 and Huckett 

1 0 et al. (1 990) Chemical Pharmacology 40(2):253-263), transferrin can be used to target 
smooth muscle cells (see e.g., Wagner et al. (1990) PNAS 87:3410-3414 and U.S. Patent 
No. 5, 354,844 (Beug et al.)), Apolipoprotein E can be used to target nerve cells, and 
pulmonary surfactants, such as Protein A, can be used to target epithelial cells (see e.g., 
Ross et al. (1995) Human Gene Therapy 6:3 1-40). 

1 5 The cationic moiety of the carrier molecule can be any positively charged species 

capable of electrostatically binding to negatively charged polynucleotides. Preferred cationic 
moieties for use in the carrier are polycations, such as polylysine (e.g., poIy-L-lysine), 
polyarginine, polyornithine, spermine, basic proteins such as histones (Chen et al., supra.% 
avidin, protamines (see e.g., Wagner et al., supra.), modified albumin (i.e., N-acylurea 

20 albumin) (see e.g., Huckett et al., supra.) and polyamidoamine cascade polymers (see e.g., 

Haensler et al. (1993) Bioconjugate Chem. 4: 372-379). A preferred polycation is polylysine 
(e.g., ranging from 3,800 to 60,000 daltons). Other preferred cationic moieties for use in the 
carrier are cationic liposomes. 

In one embodiment, the carrier comprises polylysine having a molecular weight of 

25 about 1 7,000 daltons (purchased as the hydrogen bromide salt having a MW of a 26,000 
daltons), corresponding to a chain length of approximately 100-120 lysine residues. In 
another embodiment, the carrier comprises a polycation having a molecular weight of about 
2,600 daltons (purchased as the hydrogen bromide salt having a MW of a 4,000 daltons), 
corresponding to a chain length of approximately 15-10 lysine residues. 

30 The carrier can be formed by linking a cationic moiety and a cell-specific ligand using 

standard cross-linking reagents which are well known in the art. The linkage is typically 
covalent. A preferred linkage is a peptide bond. This can be formed with a water soluble 
carbodiimide, such as l-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC), 
as described by McKee et al (1994) Bioconiugate Chem. 5: 306-3 1 1 or Jung, G. et al. (1981) 

3 5 Biochem. Biophvs. Res. Commun. 101 : 599-606 or Grabarek et al. (1 990) Anal. Biochem. 
185:13 1 . Alternative linkages are disulfide bonds which can be formed using cross-linking 
reagents, such as N-Succinimidyl 3-(2-pyridyldithio)propionate (SPDP), N- 
hydroxysuccinimidyl ester of chlorambucil, N-Succinimidyl-(4-Iodoacetyl)aminobenzoate) 
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(SIAB), Sulfo-SIAB, and Sulfo-succinimidyl-4-maleimidophenyl-butyrate (Sulfo-SMPB). 
Strong noncovalent linkages, such as avidin-biotin interactions, can also be used to link 
cationic moieties to a variety of cell binding agents to form suitable carrier molecules. 
The linkage reaction can be optimized for the particular cationic moiety and cell 
5 binding agent used to form the carrier. The optimal ratio (w:w) of cationic moiety to cell 

binding agent can be determined empirically. This ratio will vary with the size of the cationic 
moiety (e.g., polycation) being used in the carrier, and with the size of the polynucleotide to 
be complexed. However, this ratio generally ranges from about 0.2-5.0 (cationic moiety : 
ligand). Uncoupled components and aggregates can be separated from the carrier by 
10 molecular sieve or ion exchange chromatography (e.g., Aquapore™ cation exchange, 
Rainin). 

In one embodiment of the invention, a carrier made up of a conjugate of 
asialoorosomucoid and poly lysine is formed with the cross linking agent l-(3- 
dimethylaminopropyl)-3 -ethyl carbodiimide. After dialysis, the conjugate can be separated 
15 from unconjugated components by preparative acid-urea polyacrylamide gel electrophoresis 
(pH4-5). 

Following formation of the carrier molecule, the polynucleotide (e.g., plasmid) is 
linked to the carrier so that (a) the polynucleotide is sufficiently stable (either in vivo, ex vivo, 
or in vitro) to prevent significant uncoupling of the polynucleotide extracellularly prior to 

20 internalization by the target cell, (b) the polynucleotide is released in functional form under 
appropriate conditions within the cell, (c) the polynucleotide is not damaged and (d) the 
carrier retains its capacity to bind to cells. Generally, the linkage between the carrier and the 
polynucleotide is noncovalent. Appropriate noncovalent bonds include, for example, 
electrostatic bonds, hydrogen bonds, hydrophobic bonds, anti-polynucleotide antibody 

25 binding, linkages mediated by intercalating agents, and streptavidin or avidin binding to 

polynucleotide-containing biotinylated nucleotides. However, the carrier can also be directly 
(e.g., covalently) linked to the polynucleotide using, for example, chemical cross-linking 
agents (e.g., as described in WO-A-9 1/0475 3 (Cetus Corp.), entitled "Conjugates of 
Antisense Oligonucleotides and Therapeutic Uses Thereof). 

30 As described in Example 4, polynucleotide-carrier complexes can be formed by 

combining a solution containing carrier molecules with a solution containing a 
polynucleotide to be complexed, preferably so that the resulting composition is isotonic (see 
Example 4). 



35 



ADMINISTRATION 

Novel DNAs of the invention can be administered to cells either in vitro or in vivo for 
transcription and/or expression therein. 
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For in vitro delivery, cultured cells can be incubated with the DNA in an appropriate 
medium under suitable transfection conditions, as is well known in the art. 

For in vivo delivery (e.g., in methods of gene therapy) DNAs of the invention 
(preferably contained within a suitable expression vector) can be administered to a subject in 
5 a pharmaceutically acceptable carrier. The term "pharmaceutically acceptable carrier", as 
used herein, is intended to include any physiologically acceptable vehicle for stabilizing 
DNAs of the present invention for administration in vivo, including, for example, saline and 
aqueous buffer solutions, solvents, dispersion media, antibacterial and antifungal agents, 
isotonic and absorption delaying agents, and the like. The use of such media and agents for 

10 pharmaceutically active substances is well known in the art. Except insofar as any 

conventional media is incompatible with the polynucleotide-carrier complexes of the present 
invention, use thereof in a therapeutic composition is contemplated. 

Accordingly, novel DNAs of the invention can be combined with pharmaceutically 
acceptable carriers to form a pharmaceutical composition. In all cases, the pharmaceutical 

1 5 composition must be sterile and must be fluid to the extent that easy syringability exists. It 
must be stable under the conditions of manufacture and storage and must be preserved against 
the contaminating action or microorganisms such as bacteria and fungi. Protection of the 
polynucleotide-carrier complexes from degradative enzymes (e.g., nucleases) can be achieved 
by including in the composition a protective coating or nuclease inhibitor. Prevention of the 

20 action of microorganisms can be achieved by various anti-bacterial and anti-fungal agents, for 
example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. 

Novel DNAs of the invention may be administered in vivo by any suitable route of 
administration. The appropriate dosage may vary according to the selected route of 
administration. The DNAs are preferably injected intravenously in solution containing a 

25 pharmaceutically acceptable carrier, as defined herein. Sterile injectable solutions can be 
prepared by incorporating the DNA in the required amount in an appropriate buffer with 
one or a combination of ingredients enumerated above or below, followed by filtered 
sterilization. Other suitable routes of administration include intravascular, subcutaneous 
(including slow-release implants), topical and oral. 

30 Appropriate dosages may be determined empirically, as is routinely practiced in 

the art. For example, mice can be administered dosages of up to 1 .0 mg of DNA per 20 g 
of mouse, or about 1 .0 mL of DNA in solution per 1 .4 mL of mouse blood. 

Administration of a novel DNA, or protein expressed therefrom, to a subject can be 
in any pharmacological form including a therapeutically active amount of DNA or protein, 

35 in combination with another therapeutic molecule. Administration of a therapeutically 
active amount of a pharmaceutical composition of the present invention is defined as an 
amount effective, at dosages and for periods of time necessary to achieve the desired result 
(e.g., an improvement in clinical symptoms). A therapeutically active amount of DNA or 
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protein may vary according to factors such as the disease state, age, sex, and weight of the 
individual. Dosage regimens may be adjusted to provide the optimum therapeutic 
response. For example, several divided doses may be administered daily or the dose may 
be proportionally reduced as indicated by the exigencies of the therapeutic situation. 

5 

USES 

Novel DNAs of the present invention can be used to efficiently express a desired 
protein within a cell. Accordingly, such DNAs can be used in any context in which gene 
transcription and/or expression is desired. 

1 0 In one embodiment, the DNA is used in a method of gene therapy to treat a clinical 

disorder. In another embodiment, the DNA is used in antisense therapy to produce 
sufficient levels of nuclear and/or cytoplasmic mRNA to inhibit expression of a gene. In 
another embodiment, the DNA is used to study RNA processing and/or gene regulation in 
vitro or in vivo. In another embodiment, the DNA is used to produce therapeutic or 

1 5 diagnostic proteins which can then be administered to patients as exogenous proteins. 

Methods for increasing levels of cytoplasmic RNA accumulation and gene 
expression provided by the present invention can also be used for any and all of the 
foregoing purposes. 

In a preferred embodiment, the invention provides a method if increasing 

20 expression of a gene encoding human Factor VIII. Accordingly, the invention also 

provides an improved method of human Factor VIII gene therapy involving administering 
to a patient afflicted with a disease characterized by a deficiency in Factor VIII a novel 
Factor VIII gene in an amount sufficient to treat the disease. 

In addition, the present invention provides a novel method for altering the 

25 transcription pattern of a DNA. By correcting one or more consensus or near consensus 
splice sites within the DNA, or by adding one or more introns to the DNA, the natural 
splicing pattern of the DNA will be modified and, at the same time, expression may be 
increased. Accordingly, methods of the invention can be used to tailor the transcription of 
a DNA so that a greater amount of a particular desired RNA species is transcribed and 

30 ultimately expressed, relative to other RNA species transcribed from the DNA (i.e., 
alternatively spliced RNAs). 

Methods of the invention can also be used to modify the coding sequence of a 
given DNA, so that the structure of the protein expressed from the DNA is altered in a 
beneficial manner. For example, introns can be added to the DNA so that portions of the 

35 gene will be removed during transcription and, thus, not be expressed. Preferred gene 
portions for removal in this manner include those encoding, e.g., antigenic regions of a 
protein and/or regions not required for activity. Alternatively or additionally, consensus or 
near consensus splice sites can be corrected within the DNA so that previously 
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recognizable (i.e., operable) introns and exons are no longer recognized by a cells splicing 
machinery. This alters the coding sequence of the mRNA ultimately transcribed from the 
DNA, and can also facilitate its export from the nucleus to the cytoplasm where it can be 
expressed. 

5 

This invention is illustrated further by the following examples which should not be 
construed as further limiting the subject invention. The contents of all references and 
published patent applications cited throughout this application are hereby incorporated by 
reference. 

10 

EXAMPLES 

EXAMPLE 1 - Construction of a Human Factor VIII Gene Containing an Intron 
Spanning the (J-Domain 

15 A full-length human Factor VIII cDNA containing an intron spanning the section of 

the cDNA encoding amino acids 745-1638 (Figure 1 1) was constructed as described below. 
Amino acid numbering was designated starting with Met-1 of the mature human Factor VIII 
protein and, thus, does not include the 19 amino acid signal peptide of the protein. The p- 
domain region of a human Factor VIII protein is made up of 983 amino acids (Vehar et al. 

20 (1984) Nature 312: 337-342). Thus, the region of the cDNA spliced out during pre-mRNA 
processing corresponds to about 89% of the P-domain. 

To select suitable sites for inserting the 5 ! splice donor (SD) and 3' splice acceptor 
(SA) sites, the sequence of the full-length Factor VIII cDNA expression plasmid pC Y-6 
(SEQ ID NO:4) was scanned for convenient restriction enzyme sites. Restriction sites were 

25 selected according to the following criteria: (a) they flanked and were in close proximity to 
the sites into which the splicing signals were to be introduced, so that any PCR fragment 
generated to fill in the region between these sites would have as little chance as possible for 
undesired point mutations introduced by the process of PCR; (b) they would cut the 
expression plasmid in as few places as possible, preferably only at the site flanking the region 

30 of splice site introduction. 

The restriction sites chosen according to these criteria for cloning in the splice donor 
site were: Kpn I (base 2816 of the coding sequence of pCY-6, or base 3822 of the complete 
nucleotide sequence of pCY-6 provided in SEQ ID NO:4, since the first 1005 bases of this 
plasmid are non-coding bases), and Tth 1111 (base 3449 of the coding sequence of pCY-6, or 

35 base 4455 of the complete nucleotide sequence of pCY-6 shown in SEQ ID NO:4). The 

restriction sites chosen according to these criteria for cloning in the splice acceptor site were: 
Bel I (bases 1407 and 5424 of the coding sequence of pCY-6, or bases 2413 and 6430 of the 
complete nucleotide sequence of pCY-6 shown in SEQ ID NO:4) and BspE 1 (base 7228 of 
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the coding sequence of pC Y-6, or base 8234 of the complete nucleotide sequence of pC Y-6 
shown in SEQ ID NO:4). 

Generation of Splice Donor Site 
5 A fragment containing the region of Factor VIII cDNA from the Kpn I site to the Tth 

1111 site, with the above described splice donor sequence inserted at the appropriate spot, 
was then generated in the following manner: 

A. PCR primers were designed, such that the top strand upstream primer 
(Fragment A top) would prime at the Kpn I site of full-length Factor VIII cDNA (Figure 12) , 

1 0 and the bottom strand downstream primer (Fragment A bottom) would prime at the site of 
insertion for the 5' splice donor. The bottom strand primer also contained the insertion 
sequence. These primers were used in a PCR reaction with pCIS-F8 (full-length Factor VIII 
cDNA expression ^lasmid) as template to yield "Fragment A," which contains the sequence 
spanning the region of Factor VIII cDNA from Kpn I to the splice donor insertion site, 

1 5 located at the 3 f end of the fragment. 

B. In similar fashion, "Fragment B" was generated using primer "Fragment B 
top," which contains the insertion sequence, and would prime at the insertion site of full- 
length Factor VIII cDNA, and primer "Fragment B bottom," which would prime at the Tth 
1111 site of full-length Factor VIII cDNA. "Fragment B" contains the sequence spanning the 

20 region of Factor VIII cDNA from the splice donor insertion site to Tthl 1 1 L The 5' splice 
donor insertion sequence was located at the 5' end of the fragment. 

C. Fragments A and B were run on a horizontal agarose gel, excised, and 
extracted, in order to purify them away from unincorporated nucleotides and primers. 

D. These fragments were then combined in a PCR reaction using as primers 
25 "Fragment A top" and "Fragment B bottom." The regions at the 3' end of Fragment A and 

the 5' end of Fragment B overlapped because they were identical, and the final product of this 
reaction was a PCR fragment spanning the Factor VIII cDNA from Kpn I to Tthl 111, and 
containing the engineered splice donor at the insertion site, i.e., near the beginning of the 
coding region of the p-domain of Factor VIII. This fragment was designated "Fragment AB." 
30 E. Fragment AB (an overlap PCR product) was cloned into the EcoR V site of 

pBluescript II SK(+) to yield clone pBS-SD (Figure 9), and the sequence of the insertion was 
then confirmed. 

Generation of Splice Acceptor Site 
35 A fragment containing the region of Factor VIII cDNA from the second Bel I site to 

the BspE I site, with the above described splice acceptor sequence inserted at the appropriate 
spot, was generated in the following manner: 
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A. PCR primers were designed, such that the top strand upstream primer (Primer 
A) would prime at the second Bel I site, and the bottom strand downstream primer (Primer 
B2) would prime at the insertion site for the 3 r splice acceptor. The bottom strand primer also 
contained the restriction sites Mun I and BspE L These primers were used in a PCR reaction 

5 with pCIS-F8 as template to yield "Fragment I," which contains the sequence spanning the 
region of Factor VIII cDNA from the Bel I site to the insertion site, with the Mun I and BspE 
I sites located at the 3' end of the fragment. 

B. In a similar fashion, "Fragment III" was generated using "Primer G3" which 
contains the restriction site BstE II, the splice acceptor recognition sequence (polypyrimidine 

10 tract followed by "CAG"), and primes at the insertion site for the splice acceptor; and "Primer 
H," which would prime the bottom strand at the BspE I site, so that the resulting fragment 
would contain the restriction site BstE II, the splice acceptor recognition site and sequence 
spanning the region of Factor VIII cDNA from the insertion site to BspE I. 

C. "Fragment II," which contained the branch signals and IVS 14 sequence, was 
15 generated by designing four oligos (C2, D, E, and F3), two top and two bottom, which, when 

combined, would overlap each other by 21 to 22 bases, and when filled in and amplified 
under PCR conditions, would generate a fragment containing a Mun I site, 130 bases of the 
aforementioned IVS 14 sequence (including the 2 branch sequences at the 5 ? end of the 130 
bases), and the cloning sites BstE II and BspE I. In addition, two small primers (CX and 
20 FX2) were designed that would prime at the very ends of the expected fragment, in order to 
increase amplification of full-length PCR product. All oligonucleotide primers were 
combined in a single PCR reaction, and the desired fragment was generated. 

D. All three fragments were cloned into the EcoR V site of pBluescript II SK(+), 
and their sequences were then confirmed. 

25 E. Fragment II was isolated out of pBluescript as a Mun 1 to BspE I fragment, 

and cloned into the pBluescript-Fragment I clone at the corresponding sites, to yield clone 
pBS-FI/FII (Figure 9), Fragment III was isolated out of pBluescript as a BstE II to BspE I 
fragment, and cloned into the corresponding sites of pBS-FI/FII to yield pBS-FI/FII/FIII 
(Figure 9). This final bluescript clone contained the region spanning Factor VIII cDNA from 

30 the second Bel I site to the BspE I site, and contained the IVS 14 and splice acceptor 

sequence inserted at the appropriate sites. The pBS-FI/FII/FIII clone was then sequenced. 

Cloning Splice Donor and Acceptor Sites into a Factor VIII cDNA Vector (pCY-6) 

Fragment AB and Fragment I/II/III were isolated out of pBluescript and cloned into 
35 pCY-6 in the following manner: 

A. Fragment I/II/III was isolated from pBS-FI/FII/FIII as a Bel I to BspE I 
fragment. 
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B. pCY-601 was digested to completion with BspE I, linearizing the plasmid. 
This linear DNA was partially digested with Bel I for 5 minutes, and then immediately run on 
a gel. The band corresponding to a fragment which had been cut only at the BspE I and the 
second Bel I site was isolated and extracted from the agarose gel. This isolated fragment was 

5 ligated to Fragment I/II/III and yielded pC Y-60 1/FI/FII/FIII (Figure 9). 

C. Fragment AB was isolated from pBS-SD as a Kpn I to Tthl 1 1 I fragment, and 
cloned into the corresponding sites of pC Y-60 1/FI/FII/FIII to yield pLZ-601. 

D. Plasmids pCY-6 and pLZ-601 were digested sequentially with enzymes Nco I 
and Sal L The small fragment of the pCY-6 digest and the large fragment of the pLZ-601 

10 digest were isolated and ligated together to yield plasmid pLZ-6, a second p-domain intron 

Factor VIII expression plasmid. 

pCY-6 and pCY-601 are expression plasmids for full-length Factor VIII cDNA. The 

difference between the two is that the former contains an intron in the 5' untranslated region 

of the Factor VIII transcript, derived from the second IVS of rabbit beta globin gene. The 
15 latter lacks this engineered IVS. In vitro experiments have shown that pCY-601 yields 

undetectable levels of Factor VIII, while pCY-6 yields low but detectable Factor VIII levels. 

Expression Assays 

To test expression of the various Factor VIII cDNA plasmids including those created 
20 as described above, plasmids were transfected at a concentration of 2.0-2.5 jig/ml into HuH-7 
human carcinoma cells using the calcium phosphate precipitation method described by 
O'Mahoney et al. (1994) DNA & Cell Biol 13(12): 1227-1232. Expression levels were 
measured using the KabiCoATest (Kabi Inc., Sweden). This is both a quantitative and a 
qualitative assay for measuring Factor VIII expression, because it measures enzymatic 
25 activity of Factor VIIL 

Reverse Transcriptase-PCR Analysis of Cells Transfected With Factor VIII 
Expression Plasmids 

To confirm that the engineered intron spanning the (3-domain of the Factor VIII 
30 cDNA in plasmid pLZ-6 resulted in proper splicing of the p-domain coding region, reverse 
transcriptase (RT)-PCR analysis was performed as follows: 

HUH7 cells in T-75 flasks were transfected via CaP0 4 precipitation with 36 jag of 
each of the following DNA plasmids: 

pC Y-2 p-domain deleted human Factor VIII cDNA 

35 pCY-6 Full-length human Factor VIII cDNA 

pLZ-6 Full length human Factor VIII cDNA with engineered P-domain intron 

75 ng of pCMVhGH was co-transfected as a transfection control. Untransfected cells 
were grown alongside as a negative control. 
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Total RNA was isolated from cells 24 hours post-transfection using Gibco BRL 
Trizol reagent, according to the standard protocol included in product insert. 

RT-PCR Experiments were performed as follows: RT-PCR was performed on all 
RNA preps to characterize RNA. "Minus RT" PCR was performed on all RNA preps as a 
5 negative control (without RT, only DNA is amplified). PCR was performed on plasmids 
used in transfection assays to compare with RT-PCRs of the RNA preps. All RT-PCR was 
performed with Access RT-PCR system (Promega, Cat. #A1250). In each 50 jj,1 reaction, 1.0 
jig total RNA was used as template. Primer pairs were designed according to Factor VIII 
sequences as follows: the 5' primer anneals to the top strand of Factor VIII, about 250 base 
10 pairs upstream of the p-domain junction; while the 3' primer anneals to the bottom strand of 
Factor VIII, about 250 base pairs downstream of the p-domain junction. 

The nucleotide sequences of the primers used to characterize (i.e., confirm) the P- 
domain intron splicing were as follows: 

15 5' primer TS 2921-2940: 5 TGG TCT ATG AAG ACA CAC TC 3 ' 

(20 mer) 

3' primer BS 6261-6280: 5 TGA GCC CTG TTT CTT AGA AC 3 ' 
(20 mer) 



20 RT-PCR files were set up according to manufacturer's recommendation: 

48°C, 45 minutes; xl cycle 

94°C, 2 minutes; xl cycle 

94°C, 30 sec; x 40 cycles 

60°C, 1 min; x 40 cycles 
25 68°C, 2 min; x 40 cycles 

68°C, 7 min; x 1 cycle 

4°C, soak overnight 

The data obtained from the RT-PCR assays demonstrated that engineered p-domain 
30 intron was spliced as predicted. The RT-PCR product (-500 bp) generated from pLZ-6 
(containing the P-domain intron) was similar to that obtained from pC Y-2 (containing p- 
domain deleted Factor VIII cDNA). The RT-PCR product observed for pCY-6 (containing 
the full length Factor VIII cDNA) yielded a much larger band (-3.3 kb). 

In the control groups, it was confirmed that DNA from the Huh-7 cells transfected 
35 with various Factor VIII constructs were consistent with regular PCR results of the 

corresponding plasmids. Background bands from untransfected Huh-7 cells were presumably 
contributed by cross-over during sample handling. This can be further investigated by using 
polyA + RNA as template, as well as by setting up RT-PCR with different primer sets. 
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EX AMPLE 2 - Correction of Consensus and near Consensus Splice Sites Within a 
Human Factor VIII Gene 

Plasmid pC Y-2, containing the coding region of the p-domain deleted human Factor 
5 VIII cDNA (nucleotides 1 006-5379 of SEQ ID NO:2), was analyzed using the Mac Vector™ 
program for consensus and near consensus (a) splice donor sites, (b) splice acceptor sites and 
(c) branch sequences. Near consensus 5' splice donor sites were selected using the following 
criteria: sites were required to contain at least 5 out of the 9 splice donor consensus bases 
(i.e., (C/A)AGGT(A/G)AGT), including the invariant GT, provided that if only 5 out of 9 

10 bases were present, these 5 bases were located consecutively in a row. Near consensus 3 r 

splice acceptor sites were selected using the following criteria: sites were required to contain 
at least 3 out of the following 14 splice acceptor consensus bases (Y=10)CAGG (wherein Y 
is a pyrimidine within the pyrimidine track), including the invariant AG. Only branch 
sequences which were 100% consensus were searched for. 

15 Using these criteria, 23 near consensus 5' splice donor sequences, 22 near consensus 

3' splice acceptor sequences, and 18 consensus branch sequences were identified. No 
consensus 5 ? splice donor or 3 ! splice acceptor sequences were identified. To correct these 
near consensus splice donor and acceptor sequences, and consensus branch sequences, it was 
first determined whether the invariant GT, AG. or A bases within the site could be substituted 

20 without changing the coding sequence of the site. If they could be, then these conservative 

(silent) substitutions were made, thereby rendering the site non-consensus (since the invariant 
bases are required for recognition as a splice site). 

If the invariant bases within selected consensus and near consensus sites could not be 
substituted without changing the coding sequence of the site (i.e., if no degeneracy existed for 

25 the amino acid sequence coded for), then the maximum number of silent point mutations 

were made to render the site as far from consensus as possible. All bases which contributed 
to homology of the consensus or near consensus site with the corresponding consensus 
sequence, and which were able to be conservatively substituted (with non-consensus bases), 
were mutated. 

30 Using these guidelines, 99 silent point mutations were selected, as shown in Figure 

4A-4C. The positions of each of these silent point mutations is shown in Figure 3. 

To prepare a new pC Y-2 human p-domain deleted Factor VIII cDNA coding 
sequence which contains the above-described corrections, the following procedure can be 
used: 

35 Overlapping 60-mer oligonucleotides can be synthesized based on the coding 

sequence of pCY2. Each of the 185 oligonucleotide contains the desired corrections. These 
oligonucleotides are then assembled in five segments (shown in Figure 9) using the method 
of Stemmer et al. (1995) Gene 164: 49-53. Prior to assembly, each segment can be 
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sequenced and tested in in vitro transfection assays (nuclear and cytoplasmic RNA analysis) 
in pCY2. A schematic illustration of this process is shown in Figures 8. The plasmid 
containing the new corrected coding sequence is desginated "pDJC. H 

To test expression levels of pDJC, the plasmid can be transfected at a concentration of 
5 2.0-2.5 ixg/ml into HuH-7 human carcinoma cells using any suitable transfection technique, 
such as the calcium phosphate precipitation method described by O'Mahoney et al. (1994) 
DNA & Cell Biol. 13(12): 1227-1232. Factor VIII expression can then be measured using the 
KabiCoATest (Kabi Inc., Sweden). This is both a quantitative and a qualitative assay for 
measuring Factor VIII expression, because it measures enzymatic activity of Factor VIII. 
10 Alternatively, plasmids such as pDJC can be tested for in vivo expression using the procedure 
described below in Example 4. 

EXAMPLE 3 - Optimized Expression Vectors 

Optimized expression vectors for liver-specific and endothelium-specific human 

1 5 Factor VIII expression were prepared and tested as follows: 

The P-domain deleted human Factor VIII cDNA was obtained through Bayer 
Corporation in plasmid p25D, having a coding sequence corresponding to nucleotides 1006- 
5379 of SEQ ID NO:2. The human thyroid binding globulin promoter (TBG) (bases -382 to 
+3) was obtained by PCR from human liver genomic DNA (Hayashi et al. (1993) Mol Endo. 

20 7:1049). The human endothelin-1 (ET-1) gene promoter (Lee et al. (1990) J. Biol Chem. 
265 (18) was synthesized by amplification of overlapping oligos in a PCR reaction. 

After sequence confirmation, the TBG and ET-1 promoters were cloned into two 
separate vectors upstream of an optimized leader sequence (SEQ ID NO:l 1), using standard 
cloning techniques. The leader sequence was designed in a similar manner to that reported by 

25 Kozak et al. (1994) J. Mol Biol 235:95) and synthesized (Retrogen Inc., San Diego, CA) as 
71 base pair top and bottom strand oligos, annealed and cloned upstream of the Factor VIII 
ATG. The 126 base pair intron-1 of the rabbit p-globin gene, containing the nucleotide 
sequence modifications shown in Figure 23 (SEQ ID NO:7), was also synthesized and inserted 
into the leader sequence following base 42 of the 71 nucleotide sequence. 

30 In the construct containing the TBG promoter, top and bottom strands of the human 

alpha- 1 microglobulin/bikunin enhancer (ABP), sequences -2804 through -2704 (Rouet et al. 
(1992) J. Biol Chem. 267 :20765), were synthesized, annealed and cloned upstream of the 
promoter. Cloning sites flanking the enhancer were designed to facilitate easy 
multimerization. In the construct containing the ES-1 promoter, top and bottom strands of the 

35 human c-fos SRE enhancer (Treisman et al. (1986) Cell 46) were synthesized, annealed and 
cloned upstream of the promoter. 

The post-transcriptional regulatory element (PRE) from hepatitis B virus, was isolated 
from plasmid Adw-HTD as a 587 base-pair Stu I-Stu I fragment. It was cloned into the 3* 
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UTR of the Factor VIII construct (at the Hpa I site) containing the TBG promoter and ABP 
enhancers, upstream of the polyadenylation sequence. A two copy PRE element was isolated 
as a Spe I-Spe I fragment from an early vector where two copies had ligated together. This 
fragment was converted to a blunt end fragment by the Klenow fragment of E-coli DNA 
5 polymerase I and also cloned into the Factor VIII construct at the same Hpa I site. 

Thus, the following constructs were produced using the foregoing materials and 
methods: 

Plasmid pCY-2 having a 5* untranslated region containing the TBG promoter, two 
copies of the ABP enhancer; and the modified rabbit P-globin IVS, all upstream of the human 
1 0 P-domain deleted Factor VIII gene. 

Plasmid pCY2-SE5 which was identical to pC Y-2, except that the TBG promoter was 
replaced by the ET-1 gene promoter, and the ABP enhancers (both copies) were replaced by 
one copy of the SRE enhancer. 

Plasmid pCY-201 which was identical to pCY-2, except that it lacked the 5' intron. 
1 5 Plasmid pC Y-40 1 and pC Y-402 which were identical to pC Y-20 1 , except that they 

contained one and two copies of the HBV PRE, respectively. 

Expression levels for each of the foregoing gene constructs was compared in human 
hepatoma cells (HUH-7) maintained in DMEM (Dulbecco's modified Eagle medium 
(GIBCO BRL), supplemented with 10% heat inactivated fetal calf serum (10% FCS), 
20 penicillin (50 IU/ml), and streptomycin (50 jag/ml) in a humidified atmosphere of 5% CO2 at 
37°C. For experiments involving quantitation of human factor VIII protein, media was 
supplemented with an additional 10% FCS. DNA transfection was performed by a calcium 
phosphate coprecipitation method. 

Other human Factor VIII gene constructs (shown below in Table I) tested for 
25 expression, prepared as described above, included constructs which were identical to pCY-2, 
except that they contained (a) the TBG promoter with no enhancer or 5 r intron, (b) the TBG 
promoter with a 5' modified rabbit p-globin intron (present within the leader sequence), but no 
enhancer, (c) the TBG promoter with one copy of the ABP enhancer and a 5 f modified rabbit p 
-globin intron (present within the leader sequence), and (d) the TBG promoter with two copies 
30 of the ABP enhancer and a 5' modified rabbit P-globin intron (present within the leader 
sequence). 

Active Factor VIII protein was measured from tissue culture supernatants by COAtest 
VIII :c/4 kit assay specific for active Factor VIII protein. Transfection efficiencies were 
normalized to expression of cotransfected human growth hormone (hGH). 
35 As shown below in Table I, liver-specific human Factor VIII expression is 

significantly increased by the combined use of the TBG promoter and a 5' intron within the 5 r 
UTR of the gene construct. Expression is further increased (over 30 fold) by adding a copy of 
the ABP enhancer in the same construct. Expression is still further increased (over 60 fold) by 



ATTORNEY DOCrvET NO: TTI-180 



-38- 

using two copies of the ABP enhancer in the same construct. In addition, as shown in Figure 
18, expression is also significantly increased by adding one or more PRE sequences into the 3' 
UTR of the gene construct, although, in this experiment, not as much as by adding a 5' intron 
within the 5* UTR. 



TABLE I 



5' Region Tested 


Fold Increase in Factor 
VIII Expression In Vitro 


TBG Promoter 


1 


TBG Promoter, 5' Intron 


3.5 


ABP Enhancer (1 copy), 
TBG Promoter, 5' Intron 


30.1 


ABP Enhancer (2 copies), 
TBG Promoter, 5' Intron 
(pCY-2) 


63.2 



10 

Expression of pCY2-SE5 was also tested and compared with pCY-2 in (a) bovine 
aortic endothelial cells and (b) HUH-7 cells. Transfections and Assays were performed as 
described above. Significantly more biologically active human Factor VIII was secreted from 
cells transfected with pCY2-SE5 than with pCY-2 (625 pg/ml vs. 280 pg/ml). While liver- 

1 5 specific pC Y-2 expressed more than 1 0 ng/ml of human Factor VIII from HUH-7 cells, no 
human Factor VIII could be detected from pC Y2-SE5 transfected HUH-7 cells. 

Constructs were also tested in vivo. Specifically, pCY-2 and pCY2-SE5 were tested 
in mouse models by injecting mice (tail vein) with 10 (ig of DNA in one 1.0 ml of solution 
(0.3 M NaCl, pH 9). Plasmids pCY-6, pLZ-6 and pLZ-6A (described in Example 1) were 

20 tested in the same experiment. Levels of human Factor VIII were measured in mouse serum. 
The results are shown in Figure 19. Plasmid pCY-2, containing the TBG promoter, 2 copies 
of the ABP enhancer, and an optimized 5 ! intron, had the highest expression, followed by 
pLZ-6A, pLZ-6, pCY2-SE5 and pCY-6. 

Plasmid pCY-2 was also tested in vivo in mice, along with plasmid p25D which 

25 contained the same coding sequence (for human p-domain deleted Factor VIII) without an 
optimized 5' UTR. Specifically, instead of 2 copies of the ABP enhancer, one copy of the 
TBG promoter and a leader sequence containing an optimized (i.e., modified to contain 
consensus splice donor and acceptor sites and a consensus branch and pyrimidine track 
sequence) 5' rabbit p-globin intron (as contained in the 5' UTR of pCY-2), p25D contained 

30 within its 5 ! UTR one copy of the CMV enhancer, one copy of the CMV promoter, and a 
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leader sequence containing an unmodified short (130 bp) chimeric human IgE intron 
(containing uncorrected near consensus splice donor and acceptor sites). Plasmids were 
injected into mice (tail vein) in the form of asialoorosomucoid/polylysine/DNA complexes 
formed as described below in Example 4. Mice were injected with 10 jag of DNA 
5 (complexed) in 1 .0 of solution (0.3 M NaCl, pH 9). 

The results are shown in Figure 25 and demonstrate that optimization of gene 
constructs by modification of 5' UTRs to contain novel combinations of strong tissue-specific 
promoters and enhancers, and optimized introns (e.g. modified to contain consensus splice 
donor and acceptor sites and a consensus branch and pyrimidine track sequence) significantly 
10 increases both levels and duration of gene expression. Notably, expression of p25D shut off 
after only 8 days, whereas expression of pCY-2 was maintained at nearly 100% of initial 
levels (well in the human therapeutic range of 10 ng/ml or more) for over 10 days. In the 
same experiment, expression was maintained well in the therapeutic range for greater than 30 
days. 

1 5 Overall, the results of the foregoing examples demonstrate that gene expression can 

be significantly increased and prolonged in vivo by optimizing untranslated regulatory 
regions and/or coding sequences in accordance with the teachings of the present invention. 

EXAMPLE 4 - Targeted Delivery of Novel Genes to Cells 

20 Novel genes of the invention, such as novel Factor VIII genes contained in 

appropriate expression vectors, can be selectively delivered to target cells either in vitro or in 
vivo as follows: 

Formation of Targeted Molecular Complexes 

25 I. Reagents 

Protamine, poly-L-lysine (4kD, lOkD, 26kD; mean MW) and ethidium bromide can 
be purchased from Sigma Chemical Co., St. Louis, MO. l-[3-(dimethylamino)-propyl]-3- 
ethylcarbodiimide (EDC) can be purchased from Aldrich Chemical Co, Milwaukee, WI. 
Synthetic polylysines can be purchased from Research Genetics (Huntsville, AL) or Dr. 

30 Schwabe (Protein Chemistry Facility at the Medical University of South Carolina). 
Orosomucoid (OR) can be purchased from Alpha Therapeutics, Los Angeles, CA. 
Asialoorosomucoid (AsOR) can be prepared from orosomucoid (15 mg/ml) by hydrolysis 
with 0. 1 N sulfuric acid at 76°C for one hour. AsOR can then be purified from the reaction 
mixture by neutralization with 1.0 N NaOH to pH 5.5 and exhaustive dialysis against water at 

35 room temperature. AsOR concentration can be determined using an extinction coefficient of 
0.92 ml mg-1, cm~l at 280 nm. The thiobarbituric acid assay of Warren (1959) J. Biol 
Chem. 234:1971-1975 or of Uchida (1977) J. Biochem. 82:1425-1433 can be used to verify 
desialylation of the OR. AsOR prepared by the above method is typically 98% desialyated. 
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II. Formation of Carrier Molecules 

Carrier molecules capable of electrostatically binding to DNA can be prepared as 
follows: AsOR-poly-L-lysine conjugate (AP26K) can be formed by carbodiimide coupling 
5 similar to that reported by McKee ( 1 994) Bioconj, Chem. 5 :306-3 1 1 . AsOR, 26kD poly-L- 
lysine and EDC in a 1:1:0.5 mass ratio can be reacted as follows. EDC (dry) is added directly 
to a stirring aqueous AsOR solution. Polylysine (26 kD) is then added, the reaction mixture 
adjusted to pH 5.5-6.0, and stirred for two hours at ambient temperature. The reaction can be 
quenched by addition of Na3P04 (200 mM, pH 1 1) to a final concentration of 10 mM. The 
1 0 AP26K conjugate can be first purified on a Fast Flow Q Sepharose anion exchange 

chromatography column (Pharmacia) eluted with 50 mM Tris, pH 7.5; and then dialyzed 
against water. 

III. Calculation of Charge Ratios (+/-) 

15 Charge ratios of purified carrier molecules can be determined as follows: Protein- 

polylysine conjugates (e.g., AsOR-PL or OR-PL) are exhaustively dialyzed against ultra-pure 
water. An aliquot of the dialyzed conjugate solution is lyophilized, weighed and dissolved in 
ultra-pure water at a specific concentration (w/v). Since polylysine has minimal absorbance 
at 280 nm, the AsOR component of AsOR-polylysine (w/v) is calculated using the extinction 

20 coefficient at 280 nm. The composition of the conjugate is estimated by comparison of the 
concentration of the conjugate (w/v) with the concentration of AsOR (w/v) as determined by 
UV absorbance. The difference between the two determinations can be attributed to the 
polylysine component of the conjugate. The composition of OR-polylysine can be calculated 
in the same manner. The ratio of conjugate to DNA (w/w) necessary for specific charge 

25 ratios then can be calculated using the determined conjugate composition. Charge ratios for 
molecular complexes made with, e.g., polylysine or protamine, can be calculated from the 
amino acid composition. 

IV. Complexation With DNA 

30 To form targeted DNA complexes, DNA (e.g., plasmid DNA) is preferably prepared 

in glycine (e.g., 0.44 M, pH 7), and is then rapidly added to an equal volume of carrier 
molecule, also in glycine (e.g., 0.44 M, pH 7), so that the final solution is isotonic. 

V. Fluorescence Quenching Assay 

35 Binding efficiencies of DNA to various polycationic carrier molecules can be 

examined using an ethidium bromide-based quenching assay. Solutions can be prepared 
containing 2.5 ng/ml EtBr and 10 |xg/ml DNA (1 :5 EtBr:DNA phosphates molar ratio) in a 
total volume of 1 .0 ml. The polycation is added incrementally with fluorescence readings 
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taken at each point using a fluorometer (e.g., a Sequoia-Turner 450), with excitation and 
emission wavelengths at 540 nm and 585 nm, respectively. Fluorescence readings are 
preferably adjusted to compensate for the change in volume due to the addition of polycation, 
if the polycation did not exceed 3% of the original volume. Results can be reported as the 
5 percentage of fluorescence relative to that of uncomplexed plasmid DNA (no polycation). 

Cell Delivery In Vivo or In Vitro 

DNA complexes prepared as described above can be administered in solution to 
subjects via injection. By way of example, a 0.1-1.0 ml dose of complex in solution can be 
10 injected intravenously via the tail vein into adult (e.g., 18-20 gm) BALB/C mice, at a dose 
ranging from <1. 0-1 0.0 p,g of DNA complex per mouse. 

Alternatively, DNA complexes can be incubated with cells (e.g., HuH cells) in culture 
using any suitable transfection protocol known in the art for targeted uptake. Target cells for 
transfection must contain on their surface a component capable of binding to the cell-binding 
1 5 component of the DNA complex. 

EQUIVALENTS 

Although the invention has been described with reference to its preferred 
embodiments, other embodiments can achieve the same results. Those skilled in the art will 
20 recognize or be able to ascertain using no more than routine experimentation, numerous 

equivalents to the specific embodiments described herein. Such equivalents are considered to 
be within the scope of this invention and are encompassed by the following claims. 

INCORPORATION BY REFERENCE 
25 The contents of all references and patents cited herein are hereby incorporated by 

reference in their entirety. 
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What is claimed is: 

1 . An isolated DNA comprising one or more consensus or near consensus splice 
5 sites which have been corrected to increase expression of the DNA. 

2. The isolated DNA of claim 1 comprising a cDNA clone. 

3. The isolated DNA of claim 1, wherein the one or more consensus or near 
10 consensus splice sites are corrected by conservative mutation of at least one consensus 

nucleotide. 

4. The isolated DNA of claim 3, wherein the maximum number of conservative 
mutations are made within the one or more consensus or near consensus splice sites. 

15 

5. The isolated DNA of claim 1 wherein the one or more consensus or near 
consensus splice sites comprises a 5 f splice donor site which is corrected by mutating one or 
both of the nucleotides within the essential GT pair. 

20 6. The isolated DNA of claim 1 wherein the one or more consensus or near 

consensus splice sites comprises a 3' splice acceptor site which is corrected by mutating one 
or both of the nucleotides within the essential AG pair. 

7. The isolated DNA of claim 1 comprising a nucleotide sequence which encodes 
25 a Factor VIII protein. 

8. The isolated DNA of claim 1 comprising a cDNA which is expressed as a p- 
domain deleted Factor VIII protein. 

30 9. The isolated DNA of claim 8 comprising the nucleotide sequence shown in 

SEQ ID NO:l. 

10. The isolated DNA of claim 1 comprising the coding region of a full-length 
Factor VIII gene, wherein the coding region contains an intron spanning all or a portion of 

35 the gene encoding the p-domain. 

1 1 . The isolated DNA of claim 8 further comprising a second intron upstream of 
the coding region. 
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1 2 An isolated DNA comprising the coding region of a full-length Factor VIII 
gene, wherein the coding region contains an intron spanning the portion of the gene encoding 
the p-domain. 

5 

13 The isolated DNA of claim 12 comprising the coding region of the nucleotide 
sequence shown in SEQ ID NO:3. 

14. The isolated DNA of claim 12 further comprising one or more consensus or 
10 near consensus splice sites which have been corrected. 

15. An isolated DNA which is expressed as a p-domain deleted Factor VIII 
protein, said DNA comprising the coding region of a full-length Factor VIII gene modified to 
(a) correct one or more consensus or near consensus splice sites within the coding region and 

1 5 (b) to incorporate an intron into the coding region which spans the portion of the gene 
encoding the p-domain. 

1 6. The isolated DNA of claim 15 which encodes a human p-domain deleted 
Factor VIII protein. 

20 

17. An expression vector comprising the isolated DNA of claim 1 operably linked 
to a promoter sequence. 

1 8. An expression vector comprising the isolated DNA of claim 7 operably linked 
25 to a promoter sequence. 

19. An expression vector comprising the isolated DNA of claim 10 operably 
linked to a promoter sequence. 

30 20. An expression vector comprising the isolated DNA of claim 12 operably 

linked to a promoter sequence. 

21. A molecular complex comprising the expression vector of claim 17 linked to 
an agent which binds to a component on the surface of a mammalian cell. 

35 

22. A molecular complex comprising the expression vector of claim 1 8 linked to 
an agent which binds to a component on the surface of a mammalian cell. 
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23. A molecular complex comprising the expression vector of claim 19 linked to 
an agent which binds to a component on the surface of a mammalian cell. 

24. A molecular complex comprising the expression vector of claim 20 linked to 
5 an agent which binds to a component on the surface of a mammalian cell. 

25. A method of increasing expression of a gene comprising correcting one or 
more consensus or near consensus splice sites within the nucleotide sequence of the gene. 

10 26. The method of claim 25 wherein the step of correcting the one or more 

consensus or near consensus splice sites comprises conservatively mutating one or more 
consensus nucleotides within the consensus or near consensus splice site. 

27. The method of claim 25 wherein the step of correcting the one or more 
15 consensus or near consensus splice sites comprises making the maximum number of 

conservative mutations possible to consensus nucleotides within the consensus or near 
consensus splice site. 

28. The method of claim 25 comprising mutating one or both of the nucleotides 
20 within the essential GT pair, if the consensus or near consensus splice site is a 5' splice site, 

or mutating one or both of the nucleotides within the essential AG pair, if the consensus or 
near consensus splice site is a 3' splice site. 

29. The method of claim 28 wherein the gene encodes a Factor VIII protein. 

25 

30. The method of claim 25 wherein the gene is expressed as a p-domain deleted 
Factor VIII protein. 

3 1 . The method of claim 30 wherein the gene comprises the nucleotide sequence 
30 shown in SEQ ID NO:l . 

32. The method of claim 25 wherein the gene comprises the coding region of a 
full-length Factor VIII gene, and the method further comprises the step of inserting an intron 
into the coding region of the gene so that the intron spans all or a portion of the segment of 

35 the gene encoding the [5-domain. 

33. The method of claim 32 further comprising inserting a second intron upstream 
of the coding region of the gene. 
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34. A method of increasing expression of a gene encoding Factor VIII comprising 
inserting into the coding region of the gene an intron which spans all or a portion of the 
portion of the gene encoding the p-domain. 

5 

35. The method of claim 34 further comprising correcting one or more consensus 
or near consensus splice sites within the Factor VIII gene by conservative mutation of a 
consensus nucleotide. 

10 36. A method of increasing expression of a gene encoding Factor VIII comprising 

correcting one or more consensus or near consensus splice sites within the gene. 

37. The method of claim 36 wherein the correction is made by conservative 
mutation of a consensus nucleotide located within the coding region of the gene. 

15 

38. A method of producing Factor VIII comprising introducing the expression 
vector of claim 19 into a host cell capable of expressing the vector, and allowing for 
expression of the vector. 

20 39. A method of producing Factor VIII comprising introducing the expression 

vector of claim 20 into a host cell capable of expressing the vector, and allowing for 
expression of the vector. 

40. An expression vector comprising a liver-specific promoter and a liver-specific 
25 enhancer, said promoter and enhancer being derived from different genes. 

4 1 . The expression vector of claim 40, wherein the promoter and enhancer are 
located upstream from the coding sequence of a gene. 

30 42. The expression vector of claim 41, wherein the coding sequence is expressed 

as a p-domain deleted human Factor VIII protein. 

43. The expression vector of claim 40, wherein the liver-specific promoter is the 
human thyroid binding globulin promoter. 

35 

44. The expression vector of claim 40, wherein the liver-specific enhancer is the 
alpha- 1 microglobulin/bikunin enhancer. 
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45. The expression vector of claim 41 further comprising one or more introns 
located (a) downstream from the promoter and enhancer and (b) upstream from the coding 
sequence. 

5 46. The expression vector of claim 45, wherein the intron is located within the 

leader sequence of the gene. 

47. The expression vector of claim 45, wherein the intron comprises one or more 
consensus splice sites. 

10 

48. The expression vector of claim 46, wherein the leader sequence has no 
secondary structure when transcribed as RNA. 

49. The expression vector of claim 41, wherein the 3' untranslated region of the 

1 5 gene is modified to increase processing, export or stability of the mRNA transcribed from the 
gene. 

50. An expression vector comprising the human thyroid binding globulin 
promoter and the alpha- 1 microglobulin/bikunin enhancer. 

20 

5 1 . The expression vector of claim 50 comprising two or more copies of the 
alpha- 1 microglobulin/bikunin enhancer. 

52. The expression vector of claim 50, wherein the human thyroid binding 

25 globulin promoter and the alpha- 1 microglobulin/bikunin enhancer are located upstream from 
the coding sequence of a gene. 

53. The expression vector of claim 52, wherein the coding sequence is also 
preceded upstream by a leader sequence comprising one or more introns. 

30 

54. The expression vector of claim 51 wherein the coding sequence is expressed 
as a p-domain deleted human Factor VIII protein. 

55. The expression vector of claim 53, wherein the intron comprises a consensus 
35 5' splice donor site, and a consensus 3' splice acceptor site. 

56. The expression vector of claim 53, wherein the intron has no secondary 
structure when transcribed as RNA. 
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ABSTRACT 

5 Novel genes and vectors exhibiting increased expression and novel splicing patterns 

are disclosed. The gene can comprise one or more consensus or near consensus splice sites 
which have been corrected. The gene can alternatively or additionally comprise one or more 
introns within coding or noncoding sequences. The gene can still further comprise modified 
5' and/or 3' untranslated regions optimized to provide high levels and duration of tissue- 
10 specific expression. In one embodiment, the gene comprises the coding region of a full- 
length Factor VIII gene modified by adding an intron within the portion of the gene encoding 
the p-domain, so that the gene is expressed as a p-domain deleted Factor VIII protein. The 
novel Factor VIII gene can also be modified to correct one or more consensus or near 
consensus splice sites within or outside of the coding region. 
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5 10 15 20 25 30 35 40 45 

* * * * ** 

pDJCcoding ATG GAA ATA GAG CTC TCC ACC TGC TTC TOT CTG TGC COT TTG COA TTC 

I I I 1 

1. p25Dcod 10 20 30 40 

[ 16902 ] ATG GAA ATA GAG CTC TCC ACC TGC MC TOT CTG TGC COT TTG OGA TTC> 

pDJCcoding ATG GAA ATA GAG CTC TCC ACC TGC TTC TOT CTG TGC COT TTG OGA TTC 



50 55 60 65 70 75 80 85 SO 95 

* * * * * * * * * * 

pDJCcoding TGC TOT AGT GCC ACC AGA AGA TAC TAC CTG GGT GCA GTG GAA CTG TCA 

II III 
1. p25Dcod50 60 70 80 90 

( 16902 I TGC TOT AGT GCC ACC AGA AGA TAC TAC CTG GGT GCA GTG GAA CTG TCA> 

pDJCcoding TGC TOT AGT GCC ACC 'AGA AGA TAC TAC CTG GGT GCA GTG GAA CTG TCA 



100 105 110 115 120 125 130 135 140 
* * * * * * 

pDJCcoding TGG GAG TAT ATG CAA AGT GAT CTC GGA GAG CTG CCT GTG GAG GCA AGA 

I . t I I I 

1. p25Dcod 100 110 120 130 140 

{ 16902 J TGG GAG TAT ATG CAA AGT GAT CTC GGt GAG CTG CCT GTG „GAC GCA AGA> 
AM AAA AAA AA * AAA AAA AAA AA V AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding TGG GAC TAT ATG CAA AGT GAT CTC GGA GAG CTG CCT GTG GAC GCA AGA 

145 150 155 160 165 170 175 180 185 190 
* * * * * * * * * * 

pDJCcoding TOT OCT OCT CGC GTG OCA AAA TCT TOT OCA TTC AAC ACC TCA GTC GTG 

I III I 

1. p25Dcod ISO 160 170 180 190 

C 16902 ] TOT pCT OCT aGa GIG OCA AAA TCT TOT OCA TTC AAC ACC TCA GTC GTG> 

AAA AAA AAA qAq AAA AAA AAA AAA AAA AAA AAA* AAA AAA AAA AAA AAA 



PDJCcoding TOT OCT OCT CGC GTG OCA AAA TCT TOT OCA TTC AAC ACC TCA GTC GTG 

195 200 205 210. 215 220 225 230 235 240 
* * * # * * * * 

pDJCcoding TAC AAA AAG ACT CTG TOT GTA GAA TTC ACG GOT CAC COT TTC AAC ATC 

I I III 

1. p25Dcod 200 210 220 230 240 

t 16902 1 TAC AAA AAG ACT CTG TOT GTA GAA TTC ACG GOT CAC COT TTC AAC ATO 



PDJCcoding TAC AAA AAG ACT GTG TOT GTA GAA TTC ACG GOT CAC COT TTC AAC ATC 



Fife. 56 



245 250 2SS 260 265 270 27S 280 285 
* * * * * * * * * 

pDJCcoding OCT AAG CCA AGO CCA CCC TOO ATG GOT CTG CTA GOT CCT ACC ATC CAA 

I I I I 

1. p2SDCod 250 260 270 280 

[ 16902 1 OCT AAQ CCA AGO CCA CCC TOO ATG GOT CTO CTA GOT CCT ACC ATC CAg> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAy 

pDJCcoding GCT AAG CCA AGG CCA CCC TGG ATG GOT CTO CTA GOT CCT ACC ATC CAA 

290 295 300 30S 310 315 320 325 330 335 
********** 
pDJCcoding GCT GAG GTT TAT GAT ACA GTG GTC ATT ACA CTT AAG AAC ATG GCT TCC 
II III 
1. p25Dco290 300 310 320 330 

( 16902 ] GCT GAG GTT TAT GAT ACA GTG GTC ATT ACA CTT AAG AAC ATO GCT TCC> 

pDJCcoding GCT GAG GTT TAT GAT ACA GTG GTC ATT -ACA -CTT AAG AAC ATO GCT TCC 

340 345 350 355 360 365 370 375 380 
**** * **** 

pDJCcoding CAT CCT GTC TCC CTT CAT GCT GTT GGT GTA TCC TAC TOO AAA GCT TCT 

1. p25Dcod 340 350 360 370 380 

( 16902 1 CAT CCT GTC agt CTT CAT GCT GTT GOT GTA TCC TAC TOO AAA GCT TCT> 

AAA AAA AAA AAA AAA AAA _ A. A 

pDJCcoding CAT CCT GTC TCC CTT CAT GCT GTT GGT- GTA TCC TAC TOG AAA GCT TCT 

385 390 395 400 405 410 415 420 425 430 
* « * ** * ** * * 

pDJCcoding GAG GGA GCT GAA TAT GAT GAT CAG ACC AGT CAA AGG GAG AAA GAA GAT 

1 o2SDcod 390 400 410 420 430 

t 16902 1 GAG GGA GCT GAA TAT GAT GAT CAG ACC AGT CAA AGG GAG AAA GAA GAT> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA A A 

pDJCcoding GAG GGA GCT GAA'TAT GAT GAT CAG ACC AGT CAA AGG GAG AAA GAA GAT 

435 440 445 450 455 460 465 470 475 480 

pDJCcoding GAT AAA GTC TTC CCT GGT GGA AGC CAT ACA TAT GTC TOO CAA GTC CTO 

l ntsnead 440 450 460 -470 480 

I 16902 ] GAT AAA GTC TTC CCT GOT GGA AGC CAT ACA TAT GTC TGG CAg GTC CTG> 

l «»W J «~k AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA *«y 

pDJCcoding GAT AAA GTC TTC CCT GGT GGA AGC CAT ACA TAT GTC TOG CAA GTC CTG 

485 490 495 500 505 510 515 520 525 
* ** * .** * * 

pDJCcoding AAA GAG AAT GGT CCA ATG GCC TCC GAG CCA CTG TGC CTT ACC TAC TCA 

III ' 
l r>2Sncod 490 500 510 520 

t 16902 1 AAA GAG AAT GOT CCA ATG GCC TCt GAC CCA CTG TGC CTT ACC TAC TCA> 

pDJCcoding AAA GAG AAT GGT CCA ATO GCC TCC GAC CCA CTG TGC CTT ACC TAC TCA 



530 
* 



535 540 S4S 550 555 560 565 570 575 
* * • ** * ** *■ 



R6- 5C 



pDJCcoding TAT CTT TCT CAT GTG GAC CTG GTT AAA QAC TTO AAT TCA GGC CTC ATT 
I I I'M • ( 

1. p25DcoS30 540 550 560 570 

f 16902 J TAT CTT TCT CAT GTQ GAC CTG GTa AAA GAC TTG AAT TCA GGC CTC ATT> 

pDJCcoding TAT CTT TCT CAT GTG GAC CTG GTT AAA GAC TTG AAT TCA GGC CTC ATT 

580 585 590 595 600 605 610 .615 620 
***** **** 

pDJCcoding GGA GCC CTA CTA GTA TOT AGA GAA GGG AGT CTG GCC AAG GAA AAG ACA 

1(111 
1. p25Dcod 580 590 600 610 620 

( 16902 I GGA GCC CTA CTA GTA TGT AGA GAA GGG AGT CTG GCC AAG GAA AAG ACA> 

pDJCcoding GGA GCC CTA CTA GTA TGT AGA GAA GGG AGT CTG GCC AAG GAA AAG ACA 

625 630 635. 640 645 650 655 660 665 670 
** * * * * **** 

pDJCcoding CAG ACC TTG CAC AAA TTT ATA CTA CTT TTT GCT GTA TTT GAT GAA GGG 

111(1 
1* p25Dcod 630 640 650 660 670 

[ 16902 1 CAG ACC TTG CAC AAA TTT ATA CTA CTT TTT GCT GTA TTT GAT GAA QGG> 

pDJCcoding CAG ACC TTG CAC AAA TTT ATA CTA CTT TTT GCT GTA TTT GAT GAA GGG 

675 680 685 690 695 700 705 710 715 720 
* * * * * * * * * * 

pDJCcoding AAA AGT TOG CAC TCA GAA ACA AAG AAC TCC CTC ATG CAA GAT AGG GAT 

. . I I I I I 

1. p2SDcod . 680 690 700 710 720 

( 16902 1 AAA AGT TOG CAC TCA GAA ACA AAG AAC TCC tTg ATG CAg GAT AGG GAT> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA y A y AAA AAy> AAA AAA AAA 

PDJCcoding AAA AGT TOG CAC TCA GAA ACA AAG AAC TCC CTC ATG CAA GAT AGG GAT 



725 730 735 740 745 750 755 760 765 
* * * * * * ♦ * * * . 

PDJCcoding OCT GCA TCT GCT COG GCC TOG OCT AAA ATG CAC ACA GTC AAT GGT TAT 

I I I I 

1. p25Dcod 730 740 7S0 760 

( 16902 1 OCT GGA TCT GCT COG GCC TGG OCT AAA ATG GAC ACA GTC AAT GGT TAT> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA'- AAA AAA AAA AAA 

pDJCcoding GCT GCA TCT GCT COG GCC TGG CCT AAA ATG CAC ACA GTC AAT GOT TAT 

770 775 780 785 790 795 800 805 810 815 
* ********* 

pDJCcoding OTA AAC AGG AGC CTG CCA GGA CTG ATT GGA TGC GAC AGG AAA TCA GTC 

II III 
1* P25DCO770 780 790 800 810 

C 16902 ] GTA AAC AGG tct CTG CCA GGfc GTG ATT GGA TGC GAC AGG AAA TCA GTO 

AAA AAA AAA 



PDJCcoding GTA AAC AGG AGC CTG CCA GGA CTG ATT GGA TGC GAC AGG AAA TCA GTC 



820 825 830 835 840 845 850 855 860 
***** **** 

pDJCcoding TAT TGG GAT GTT ATA GGA ATG GGC ACC ACT GCT GAA GIG GAC TCA ATA 

III ( ( 



t i^T^Txf^ CAT •» Mt B. «C *« JOT 0=f<L OTO Cw'Sk ATA> 

AAA AAA AAA AAm AAw AAA AAA 

proceeding TAT TOO CAT GTT ATA GOA ATG GGC ACC ACT CCT GAA QTQ CAC TCA ATA 



865 870 875 880 885 890 895 900 905 910 

* * * * * 



pDJCcoding TTC CTC QAA GGA CAC ACA TTT CTT GTT AOA AAC CAT CGC CAQ GCG TCC 

f lMoP?° 0d TTC CTC GAA GQt CA^ACA TTT CTT^GTg AQg AAC CAT CGC CAQ ^TCO 
[ 16902 1 TTC CTC GAA GOT uu. *^ ^ v ~* AAA AAA AAA AAA AAA 

pDJCCDding TTC CTC GAA GQA CAC ACA TTT CTT GTT AQA AAC CAT CGC CAG GCG TCC 



915 920 925 930 935 940 945 950 955 960 
pDJCcodinfl TTG GAA ATC TCG CCA ATA ACT TTC CTT ACT GOT CAA ACA CTC CTC ATG 

( IbSff^TTG GAA 9 ATC TCG CCA ATA ACT TTC C^AOT GOT cJaCA CTC tTg ATG> 
pDJCcoding TTG GAA. ATC CCA ATA ACT TTC CTT ACT GOT CAA ACA CTC CTC ATG 

965 970 975 980 985 .996 995 1000 1005 

preceding GAC CTT GGA CAG TTT CTA CTG TTT TGT CAT ATC TCT TCC CAC CAA CAT 

, ,«v,^ <wn 980 990 1000 

t 16902 ^ GAC CTT GgTcAG TTT OTa'cTG ^TOTC^ATC^T~C^C^CAT> 

proceeding GAC CTT GSA.CAS TTT CTA. CTG TTT TGT CAT ATC TCT TCC CAC CAA CAT 

1010 1015 1020 1025 1030 1035 1040 ^1045 1050 1055 
proceeding GAT GGC ATG GAA GCT TAT GTC AAA GTA GAC AGC TGT CCA GAG GAA CCC 

j. , n ,i 1030 1040 1050 

1. p25Dcl010 1020 *JJ »«* -q-. qao qrr. COO 

t 16902 ] GAT GGC ATG GAA OCT TAT GTC AAA OTA GAC AOC Wt ««• ^ aaa 

1 1 AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA A 

proceeding GAT GGC ATG GAA GCT TAT GTC AAA GTA GAC AGC TGT CCA GAG GAA CCC 
1060 106S 1070 1075 1080 1085 1090 1095 1100 



proceeding CAA CTA CGA ATG AAA AAT AAT GAA. GAA GCG GAA GAC TAT GAT GAT GAT 
t ibSlT^L CGA ATG°AAA AAT AAt'gAA GAA GCG GAa'gAC TAT GAt'gAT GAT> 
pMCceding CAA CTA. CGA ATG AAA AAT AAT GAA GAA. GCG GAA GAC TAT GAT GAT GAT 

1105 1110 HIS 1120 1125 1130 113S 1140 1145 11S0 
proceeding CTT ACC GAT TCT GAA ATG GAT GTQ GTC AGA TTT GAT GAT GAC AAC TCT 

( irfTWi* GAT TCT GAA^ATO GAT «7Se AGg TTT^GAT GAT GAC 



FI6. 5£ 



pDJCcoding CTT ACC GAT TCT QAA ATQ GAT GTG GTC AGA TTT GAT GAT GAC AAC TCT 

11S5 1160 1165 1170 1175 1180 1165 1130 1195 1200 

pDJCcoding CCT TCC TTT ATC CAA ATT CGC TGA GTT GCC AAG AAG CAT CCT AAA ACT 

(I I I f 

1. p25Dcod 1160 1170 1180 1190 120 o 

( 16902 ] OCT TCC TTT ATC CAA ATT CGC TCA GTT GCC AAG AAG -CAT CCT AAA ACT> 

pDJCcoding CCT TCC TTT ATC CAA ATT CGC TCA GTT GCC AAG AAG CAT CCT AAA ACT 

1205 1210 1215 1220 1225 1230 1235 1240 1245 
* * * * * * *:*-* 

pDJCcoding TOG GTA CAT TAC ATT GCT GCT GAA GAG GAG GAC TGG GAC TAT GCT CCC 

I I I I 

1. p25Dcod 1210 1220 1230 1240 

[ 16902 ] TGG GTA CAT TAC ATT GCT GCT GAA GAG GAG GAC TGG GAC TAT GCT CCO 

pDJCcoding TGG GTA CAT TAC ATT GCT GCT GAA GAG GAG GAC TGG GAC TAT GCT CCC 

1250 1255 1260 1265 1270 1275 1280 1285 1290 1295 
* * * * * * * * * * 

pDJCcoding TTA GTC CTC GCC CCC GAT GAC AGA ACT TAT AAA AGT CAA TAT TTG AAC 

I i I I I 

1* p25Dcl250 1260 1270 1280 1290 

( 16902 ) TTA GTC CTC GCC CCC GAT GAC AGA AGT TAT AAA' AGT CAA TAT TTG AAC> 

pDJCcoding TTA GTC CTC GCC CCC GAT GAC AGA AGT TAT AAA AGT CAA TAT TTG AAC 

1300 1305 1310 1315 1320' 1325 1330 1335 1340 
* * * * * * *•* -* 

pDJCcoding AAT GGC CCT GAG OGG ATT GGA AGG AAG TAC AAA AAA GTC CGA TTT ATG 

i i i r i 

1. p25Dcod 1300 1310 1320 1330 1340 

[ 16902 ] AAT GGC CCT GAG COG ATT GOt AGO AAG TAC AAA AAA GTC CGA TTT ATG> 

AAA AAA AAA AAA AAA AAA A Ay AAA AAA AAA AAA AAA AAA AAA AAA AAA 

PDJCcoding AAT GGC CCT GAG OGG ATT GGA AGG AAG TAC AAA AAA GTC CGA TTT ATG 

1345 1350 1355 1360 1365 1370 1375 1380.. 1385 1390 
* * * * * * * * *-* 

pDJCcoding OCA TAC AGA GAT GAA ACC TTT AAG ACT CGT GAA GCT ATT CAG CAT GAA 

I I I I I 

1. p2SDcod 1350 1360 1370 1380 1390 

C 16902 ] OCA TAC AGA GAT GAA ACC TTT AAG ACT CGT GAA GCT ATT CAG CAT GAA> 

pDJCcoding GGA TAC ACA GAT GAA ACC TTT AAG ACT CGT GAA GCT ATT CAG CAT GAA 

1395 1400 1405 1410 1416 1420 1425 1430 1435 1440 
* ♦ # # * * ♦ * * * 

pDJCcoding TCA GGA ATC TTG GGA CCT TTA CTT TAT GGG GAA GTT GGA GAC ACA CTG 

I I I ( ( 

1* p25Dcod 1400 1410 1420 1430 1440 

[ 16902 ] TCA GGA ATC TTG GGA CCT TTA CTT TAT GGG GAA GTT GGA GAC ACA CT0> 

pDJCcoding TCA GGA ATC TTG GGA CCT TTA CTT TAT GGG GAA GTT GGA GAC ACA CTG 



Fl 6 • 5 F 



1445 1450 1455 1460 1465 1470 1475 1480 1485 
* * * * ***** 

pDJCcoding CTC ATT ATA TTT AAO AAT CAA GCA AGC AGA OCA TAT AAC ATC TAG OCT 

I I I I 

1« p25Dcod 1450 1460 1470 1480 

[ 16902 ] tTg ATT ATA TTT AAG AAT CAA GCA AGC AGA CCA TAT AAC ATC TAC CCT> 

v^v AAA AAA ^ AA AAA AAA AAA AAA A A AAA AAA 

pDJCcoding CTC ATT ATA TTT AAG AAT CAA GCA AGC AGA CCA TAT AAC ATC TAC CCT 

1490 1495 1500 1505 1510 1515 1520 1525 1530 1535 
********** 

pDJCcoding CAC GGA ATC ACC GAT GTC CGT CCT TTG TAT TCA OGC AGA TTA CCA AAA 
I I i " I • 

1. p25Dcl490 1500 1510 1520 1530 

( 16902 ] CAC GGA ATC ACt GAT GTC CGT CCT TTG TAT TCA aGg AGA TTA CCA AAA> 

AAA AAA AAA AAy AAA AAA AAA AAA AAA AAA -AAA yAy AAA AAA AAA AAA 

pDJCcoding CAC GGA ATC ACC GAT GTC CGT OCT TTG TAT TCA OGC AGA TTA OCA AAA 

1540 1545 1550 1555 1560 1565 1570 1575 1580 
** * * * * ** * 

pDJCcoding GGA GTA AAA CAT TTG AAG GAT TTT OCA ATT CTG OOC GGA GAA ATA TTC 
I I I i 1 

1. p25Dcod 1540 1550 1560 1570 1580 

C 16902 ] GGt GTA AAA GAT TTG AAG GAT TTT OCA ATT CTG CCa GGA GAA ATA TTC> 

AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAy. AAA AAA AAA 

pDJCcoding GGA GTA AAA CAT TTG AAG GAT TTT CCA ATT CTG CCC GGA GAA ATA TTC 

1585 1590 1595 1600 1605 1610 1615 1620 1625 1630 
* * * ** * * - * * * 

pDJCcoding AAA TAT AAA TGG AGA GTG ACT GTA GAA GAT GOG OCA ACT AAA TCA GAT 

| | | 1 I 

1. p25Dcod 15S0 1600 1610 1620 ■ "30 

[ 16902 1 AAA TAT AAA TOG ACA GTG ACT GTA GAA GAT GOG OCA ACT AAA TCA GAT> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding AAA TOT AAA TOG ACA GTG ACT GTA GAA GAT GGG OCA ACT AAA TCA GAT 

1635 1640 1645 1650 1655 1660 1665 1670 1675 1680 
********** 
pDJCcoding OCT COG TOC CTG ACC OGC TAT TAC TCT AGT TTC GTC. AAT ATG GAG AGA 

1. p25Dcod 1640 1650 1660 1670 1680 

( 16902 1 OCT COG TGC CTG. ACC OGC TAT TAC TCT AGT TTC GTt AAT ATG GAG AGA> 

pDJCcoding OCT COG TGC CTG ACC CGC TAT TAC TCT AGT TTC GTC AAT ATG GAG AGA 

1685 1690 1695 1700 1705 1710 1715 1720 1725 
********* 
pDJCcoding GAT OTA OCT TCA GGA CTC ATT OGC OCT CTC CTC ATC TGC TAC AAA GAA 

1. p25Dcod 1690 1700 1710 1720 

[ 16902 ] GAT OTA OCT TCA GGA CTC ATT OGC OCT CTC CTC ATC TGC TAC AAA GAA> 

pDJCcoding GAT CTA OCT TCA GGA CTC ATT OGC CCT CTC CTC ATC TGC TAC AAA GAA 
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1730 173S 1740 174S 1750 1755 1760 1765 1770 1775 
********** 
pDJCcoding TCT GTA GAT CAA AGA GGA AAC CAG ATA ATG TCA GAC AAG AGO AAT GTC 
II |l< 
1. P 2SDcl730 1740 1750 1760 1770 

t 16902 J TCT GTA GAT CAA AGA GGA. AAC CAG ATA ATG TCA GAC AAG AGG AAT GTC> 

pDJCcoding TCT GTA GAT CAA AGA GGA AAC CAG ATA ATG TCA GAC AAG AGG AAT GTC 

1780 1785 1790 1795 1800 1805 1810 1815 1820 
********* 
pDJCcoding ATC CTG TTT TCT GTA TTT GAT GAG AAC CGA AGC TGG TAC CTC ACA GAG 
||l I I 

1. p25Dcod 1780 1790 1800 1810 1820 

C 16902 1 ATC CTG TTT TCT QTA TTT GAT GAG AAC CGA AGC TOO TAC CTC ACA GAG> 

pDJCcoding ATC CTG TTT TCT GTA TTT GAT GAG AAC CGA AGC TOG TAC CTC ACA GAG 



1825 1830 1835 1840 1845 18S0 1855 1860 1865 1870 
********** 
pDJCcoding AAT ATA CAA CGC TTT CTC CCC AAT CCC GCT GGA GTG CAG CTT. GAG GAT 

I I I I I 

i „?siv.«i 1830 1840 1850 1860 1870 

t 16902 ] AAT ATA CAA CGC TTT CTC CCC AAT CCa GCT GGA GTG CAG CTT GAG GAT* 

AAA AAA AAA AAA AAA AAA AAA AAA AAy. AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding AAT ATA CAA CGC TTT CTC CCC AAT CCC GCT GGA GTG CAG CTT GAG GAT 

1875 1880 1885 1890 1895 1900 1905 1910 1915 1920 
pDJCcoding CCA GAG TTC CAA GCC TCC AAC ATC ATG CAC AGC ATC AAT GGC TAT GTT 

( i6902 5 r*CCA GAG 8 TTC CAA Go/tCC AAC A3PC JOTOC AGC AT^AAT GGC *£™> 
pDJCcoding CCA GAG TTC CAA GCC TCC AAC ATC ATG CAC AGC ATC AAT GGC TAT GTT 

1925 1930 1935 1940 1945 1950 1955 I960 1965 
* ** * ** * * 

pDJCcoding TTC GAT AQT TTG CAG TTG TCA GTT TGT TTG CAT GAA GTA OCA TAC TGG 

i «9ciw*4 1930 1940 1950 I960 

AAy AAA AAA AAA AAA AAA- AAA AAA AAA AAA AAA My **V 

pDJCcoding TTC GAT AGT TTG CAG TTG TCA GTT TOT TTG CAT GAA GTA OCA TAC TOG 



1970 



1975 1980 1985 1990 1995 2000 2005 2010 2015 



******* 



pDJCcoding TAC ATT CTA AGC ATT GGA OCA CAG ACT GAC TTC CTT TCT GTC TTC TTC 

1 D2SDC1970 1980 1990 2000 2010 

I 16902 1 TAC ATT CTA AGC ATT GGA OCA CAG ACT GAC TTC CTT TCT GTC TTC TTC> 

pDJCcoding TAC ATT CTA AGC ATT GGA OCA CAG ACT GAC TTC CTT TCT GTC TTC TTC 
2020 2025 2030 2035 2040 2045 2050 2055 2060 



FIG- 6H 



pDJCcoding TCT GGA TAT ACC TTC AAA CAC AAA AT3 GTC TAT GAA GAC ACA CTC ACC 

I I i i I 

1. p2SDcod 2020 2030 2040 2050 2060 

t 16902 1 TCT OOA TAT ACC TTC AAA CAC AAA ATG GTC TAT GAA GAC ACA CTC ACC> 

pDJCcoding TCT GGA TAT ACC TTC AAA CAC AAA ATG GTC TAT GAA GAC ACA CTC ACC 



2065 2070 2075 2080 2085 2090 2095 2100 _ 2105 2110 
********** 
pDJCcoding CTA TTC CCA TTC TCC GGA GAA ACT GTC TTC ATG TCG ATG GAA AAC OCA 

I jll I 

1. p25Dcod 2070 2080 2090 2100 ' 2110 

( 16902 i CTA TTC CCA TTC TCa GGA GAA ACT GTC TTC ATG TOG ATG GAA AAC OCA> 

AAA AAA AAA AAA AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding CTA TTC CCA TTC TCC GGA GAA ACT GTC TTC ATG TOG ATG GAA AAC CCA 



2115 2120 2125 2130 2135 2140 2145 2150 2155 2160 
* ********* 
pDJCcoding GGA CTA TGG ATT CTG GGG TGC CAC AAC TCA GAC TTT OGG AAC AGA GGC 

I I I' I 

1* p25Dcod 2120 2130 2140 2150 . 2160 

( 16902' 1 GGt CTA TOG ATT CTG GGG TGC CAC AAC TCA GAC TTT CGG AAC AGA GGO 

AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding • GGA CTA TGG ATT CTG GGG TGC CAC AAC TCA GAC TTT CGG AAC AGA GGC 

2165 2170 2175 2180 2185 2190 2195 2200 2205 
* ** * ** * * * 

PDJCcoding ATG ACC GOC TTA CTG AAA GTT TCC AGT TGT GAC AAG AAC ACT GGA GAT 

1 I 1 I 

1. p25Dcod 2170 2180 2190 2200 

( 16902 1 ATG ACC GCC TTA CTG AAff GTT TCt AGT TGT GAC AAG AAC ACT GGt GAT> 

AAA AAA AAA AAA AAA A Ay. AAA AAy AAA AAA AAA AAA AAA AAA AAy AAA 

pDJCcoding ATG ACC GOC TTA CTG AAA GTT TOO AGT TGT GAC AAG AAC ACT GGA GAT 

2210 2215 2220 2225 2230 2235 2240 2245 2250 2255 
* * * * * * . * *.* * 

uDJCcodina TAT TAC GAG GAC AGT TAT GAA GAT ATT TCA GGA TAG TTG CTG AGT AAA 
| 1 I "I I 

1. p25Dc2210 2220 2230 2240 2250 

t 16902 1 * TAT TAC GAG GAC AGT TAT GAA GAT ATT TCA GCA TAC TTG CTG AGT AAA> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA' AAA AAA AAA .AAA 

pDJCcoding TAT TAC GAG GAC AGT TAT GAA GAT ATT TCA GCA TAC TTG CTG AGT AAA 



2260 2265 2270 2275 2280 2285 2290 2295 2300 
* * * ** * * * * 

pDJCcoding AAC AAT GCC ATT GAA CCA AGA AGO TTC TGC GAG AAC CCA OCA GTC TTG 
1 I I I I 

1. p25Dood 2260 2270 2280 2290 * 3 2!L «tw* 

t 16902 1 AAC AAT GOC ATT GAA CCA AGA AGO TTC TCC GAG AAC CCA CCA GTC TTO> 

pDJCcoding AAC AAT GOC ATT GAA CCA AGA AGC TTC TOO GAG AAC CCA CCA GTC TTG 

2305 2310 2315 2320 2325 2330 2335 2340 2345 . 2350 
********** 
pDJCcoding AAA OOC CAT CAA OGG GAA ATA ACT COT ACT ACT CTT GAA TCA GAT CAA 



Fife. 51 



1. p2SDcod 2310 2320 2330 2340 2350 

( 16902 ] AAA CQC GAT CAA COO GAA ATA ACT GOT ACT ACT CTT CAg TCA GAT CAA> 

pDJCcoding AAA OOC GAT GAA COO GAA ATA ACT GOT ACT ACT CTT GAA TCA GAT GAA 
2355 2360 2365 2370 2375 2380 2385 2390 2395 2400 

* * * * * * * * * ^ 

pDJCcoding GAG GAA ATT GAC TAT GAT GAT ACC ATA TCA GTT GAA ATO AAG AAG GAA 

I I li| 
1. p25Dcod 2360 2370 2380 2390 2400 

( 16902 ] GAG GAA ATT GAC TAT GAT GAT ACC ATA TCA GTT GAA ATO AAG AAG GAA> 

pDJCcoding GAG GAA ATT GAC TAT GAT GAT ACC ATA TCA GTT GAA ATO AAG AAG GAA 

2405 2410 2415 2420 2425 2430 2435 2440 2445 
* * * * * * * - * * 

pDJCcoding GAT TTC GAC ATT TAT GAT GAG GAT GAA AAT CAG AGC CCC OOC AGC TTT 

I f I I 

1. p25Dcod 2410 2420 2430 2440 

( 16902 ] GAT TTt GAC ATT TAT GAT GAG GAT GAA AAT GAG AGC CCC OOC AGC TTT> 

AAA A^ AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding GAT TTC GAC ATT TAT GAT GAG GAT GAA AAT GAG AGC CCC.COC AGC TTT 

2450 2455 2460 2465 2470 2475 2480 2485 2490 2495 
* * * * * * * 

pDJCcoding GAA AAG AAA AGA CGA GAC TAT TTT ATT GCT GCA GTG GAG AGG CTC TOO 

I I I I I 

1. p25Dc24S0 24tf0 2470 2480 2490 

C 16902 ) GAA. AAG AAA AGA CGA GAC TAT TTT ATT GCT GCA GIG GAG AGG CTC T00> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA . AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding GAA AAG AAA AGA CGA GAC TAT TTT ATT GCT GCA GTG GAG AGO CTC TOG 

2500 2505 2510 2515 2520 2525 2530-2535 2540 
* * * * * * * * * 

pDJCcoding GAT TAT GOO ATO JUST AGC TOO CGA GAT GTT CTA AGA AAG AGO GCT GAG 

I I I I I 

1. p25Dcod 2500 2S10 2S20 2S30 2540 

t 16902 ] GAT TAT GOO ATO AGT AGC TCC CCA GAT GTT CTA AGA AAG AGO GQT CAG> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding GAT TAT GOO ATO AGT AGC TCC CCA CAT GTT CTA AGA AAC AGO GCT CAG 

2545 2550 2555 2560 2565 2570 2575 2580 2585 2590 
* * * * * * * * * * 

pDJCcoding AGT OOC AGT GTC OCT GAG TTC AAG AAA GTA GTA TTC GAG GAA TTT ACC 

I I 1 I I 

1. p25Dcod 2550 2560 2570 2580 2590 

( 16902 ] AGT GOC AGT GTC CCT GAG TTC AAG AAA GTt GTfc TTC GAG GAA TTT ACt> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAy AA^ AAA AAA AAA AAA AAy 

pDJCcoding AGT GOC AGT GTC CCT GAG TTC AAG AAA GTA GTA TTC GAG GAA TTT ACC 

2595 2600 2605 2610 2615 2620 2625 2630 2635 2640 

* * * * * ***** 

PDJCcoding GAT GOC TCC TTT ACT GAA CCC TTA TAG COT GOA GAA CTA AAT GAA GAT 

II III 
1. p25Dcod 2600 2610 2620 2630 . 2640 

( 16902 1 GAT GOC TCC TTT ACT GA0 CCC TTA TAG GOT GOA GAA CTA AAT GAA CAT> 
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AAA AAA AAA AAA AAA AAy AAA AAA AAA AA~ AAA AA~ — 

pDJCcoding GAT GOC TOC TTT ACT CAA CCC TTA TAG COT OQA GAA CTA AAT GAA CAT 

2645 2650 2655 2660 2665 2670 2675 2680 2685 
* * * * ***** 

pDJCcoding TTG GGA CTC CTG GGG CCA TAT ATA AGA GCA GAA GTT GAA GAT AAT ATC 

I I I I 

1. p25Dcod 2650 2660 2670 2680 

( 16902 1 TTG GGA CTC CTG GGG CCA TAT ATA AOA GCA GAA GTT GAA GAT AAT ATO 

pDJCcoding TTG GGA CTC CTG GGG CCA TAT ATA AGA GCA GAA GTT GAA GAT AAT ATC 

2690 2695 2700 2705 2710 2715 2720 2725 2730 2735 
********** 
pDJCcoding ATG GTT ACC TTC AGA AAT CAG GCC TCT COT CCC TAT TCC TTC TAT TCT 
I I III 

1. p2SDc2690 - 2700 2710 2720 2730 

( 16902 ] ATG GTa ACt TTC AGA AAT CAG GCC TCT COT CCC TAT TOC TTC TAT TCT> 

AAA AAy AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding ATG GTT ACC TTC AGA AAT CAG GCC TCT GOT CCC TAT TOC TTC TAT TCT 

2740 2745 2750 2755 2760 2765 2770 2775 2780 
* *- * ** * ** * 

pDJCcoding TCC CTC- ATA TCA TAT GAG GAA GAT CAG AGG CAA GGA GCA GAA OCT AGA 

1. p25Dcod 2740 2750 2760 2770 2780 

[ 16902 1 aaC CTt ATt TCt TAT GAG GAA GAT CAG AGG CAA GGA GCA GAA OCT AGA> 

AAy AA V AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding TOC CTC ATA TCA TAT GAG GAA GAT CAG AGG CAA GGA GCA GAA OCT AGA 

2785 2790 2795 2800 2805 2810 2815 2820 2825 2830 
********** 
pDJCcoding AAA AAC TTT GTC AAG OCT AAT GAA ACC AAA ACT TAG TTT TGG AAA GTG 

1. p25Dcod 2790 2800 2810 2820 2830 

C 16902 1 AAA AAC TTT GTC AAG OCT AAT GAA ACC AAA ACT TAG TTT TGG AAA GTG> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding AAA AAC TTT GTC AAG OCT AAT GAA ACC AAA ACT TAC TTT TOG AAA GTG 

2835 2840 2845 2650 2855 2860 2865 2870 287S 2880 
* * ** * ** * ** 

nDJCcodina GAA CAT CAT ATG GCA CCC ACT AAA GAT GAG TTT GAC TOC AAA GCC TS3 
w~*~« ^ I | | I 

1. p25Dcod 2840 2850 2860 2870 ^ 2 £SL 

t 16902 1 CAA CAT CAT ATG GCA CCC ACT AAA GAT GAG TTT GAC TOC AAA GCC 

pDJCcoding CAA CAT CAT ATG GCA CCC ACT AAA GAT GAG TTT GAC TOC AAA GCC TGG 

2885 2890 2895 2900 2905 2910 2915 2920 2925 
* ** * ** * ** 

DDJCcoding OCT TAT TTC TOC GAT GTC GAC CTG GAA AAA GAT GIG CAC TCA GGC CTG 

I I I I 

1. p25Dcod 2890 2900 2910 2920 

( 16902 1 OCT TAT TTC TCt GAT GTt GAC CTG GAA AAA GAT GTG CAC TCA GGC CTG> 

AAA AAA AAA A Ay. AAA AAy. AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding OCT TAT TTC TOC GAT GTC GAC CTG GAA AAA GAT GTG CAC TCA OfcC CTG 



FIG- 5K 



2930 2935 2940 2945 29S0 2955 2960 2965 2970 2975 
********** 

pDJCcoding ATT GGA CCC CTT CTG GTC TGC CAC AOC AAC ACA CTG AAC CCT OCT CAT 

( If I I 

1. p25Dc2930 2940 2950 2960 2970 

[ 16902 ] ATT QQA CCC CTT CTG OTC TGC CAC ACt AAC ACA CTG AAC CCT OCT -CAT> 

pDJCcoding ATT GGA CCC CTT CTG GTC TGC CAC AOC AAC ACA CTG AAC CCT GCT CAT 

2980 2985 2990 2995 3000 3005 3010 3015 3020 
** * ** * * * * 

pDJCcoding GGG AGA CAA GTG ACA GTA CAG GAA TTT GCT CTG TTT TTC AOC ATC TTC 

III It 
1. p25Dcod 2980 2990. 3000 3010 3020 

( 16902 ] GGG AGA CAA GTG ACA GTA CAG GAA TTT GCT CTG TTT TTC AOC ATC TTt> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA A Ay. 

pDJCcoding GGG AGA CAA GTG ACA GTA CAG GAA TTT GCT CTG TTT TTC AOC ATC TTC 



3025 3030 3035 3040 3045 3050 3055 3060 3065 3070 
* * * * * ** * * . * * 

PDJCcoding GAT GAG AOC AAA AGO TGG TAC TTC ACT GAA AAT ATG GAA AGA AAC TGC 

I I I J I 

1* p2SDcod 3030 3040 3050 .3060 3070 

( 16902 1 GAT GAG AOC AAA AGO TGG TAC TTC ACT GAA AAT ATG GAA AGA AAC TGO 

pDJCcoding GAT GAG AOC AAA AGO TGG TAC TTC ACT GAA AAT ATG GAA AGA AAC TGC 

3075 3080 3085 3090 3095 3100 3105 3110 .3115 3120 
***** ***** 

PDJCcoding AGG GCT CCC TGC AAT ATC CAG ATG GAA GAT CCC ACT TTT AAA GAG AAT 

I I I ! I 

1. p25Dcod 3080 3090 3100 -3110 3120 

t 16902 1 AGG GCT COC TGC AAT ATC CAG ATG GAA GAT CCC ACT TTT AAA GAG AAT> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

PDJCcoding AGG GCT CCC TGC AAT ATC CAG ATG GAA GAT CCC ACT TTT AAA GAG AAT 

3125 3130 3135 3140 3145 3150 3155 3160 3165 
********* 

PDJCcoding TAT CGC TTC CAT GCA ATC .AAT GGC TAC ATA ATG GAT ACA CTA CCT GGC 

I ( J \ 

1. p25Dcod 3130 3140 3150 3160 

( 16902 ] TAT CGC TTC CAT GCA ATC AAT GGC TAC ATA ATG GAT ACA CTA CCT GOO 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding TAT CGC TTC CAT GCA ATC AAT GGC TAC ATA ATG GAT ACA CTA CCT GGC 

3170 3175 3180 3185 3190 3195 3200 3205 3210 321S 
* * * * * * * ** * 

pDJCcoding TTA GTA ATG GCT GAG GAT CAA AGG ATT GGA TGG TAT CTG CTC AGO ATG 

I I I I I 

1. p25Dc3170 3180 3190 3200 3210 

C 16902 1 TTA GTA ATG GCT CAG GAT CAA AGG ATT OGA TGG TAT CTG CTC AGO ATG> 



pDJCcoding TTA GTA ATG GCT CAG GAT CAA AGG ATT CGA TGG TAT CTG CTC AGO ATG 



FIG. 51 



3220 322S 3230 3235 3240 3245 3250 325S 3260 
***** **** 

pDJCcodlng GGC AGC AAT GAA AAC ATC CAT TCT ATT CAT TTC TCC GGA CAT QTG TTC 
I I I I I 

1. p25Dcod 3220 3230 3240 3250 3260 

( 16902 ] GQC AGC AAT GAA AAC ATC CAT TCT ATT CAT TTC agt GGA CAT GTG TTC> 

AAA AAA AA/V AAA AAA AAA AAA AAA AAA AAA AAA yyy AAA AAA AAA AAA 

pDJCcoding GGC AGC AAT GAA AAC ATC CAT TCT ATT CAT TTC TCC GGA CAT GTG TTC 

3265 3270 3275 3280 3285 3290 3295 3300 3305 3310 
****** **** 
pDJCcoding ACT OTA CGA AAA AAA GAG GAG TAT AAA ATG GCA CTG TAG AAT CTC TAT 

| I I I I 

1. p25Dcod 3270 3280 3290 3300 3310 

[ 16902 ] ACT OTA CGA AAA AAA GAG GAG TAT AAA ATG GCA CTG TAC AAT CTC TAT> 

pDJCcoding ACT GTA CGA AAA AAA GAG GAG TAT AAA ATG GCA CTG TAC AAT CTC TAT 

3315 3320 3325 3330 3335 3340 3345 3350 3355 3360 
********** 
PDJCcoding CCC GGA GTT TTC GAG ACA GTG GAA ATG TTA CCA TCC AAA GCT GGA ATT 
. J 1 I I I 

1. p25Deod 3320 3330 3340 3350 3360 

r 169Q2 1 OCa GGt GTT TTfc GAG ACA GTG GAA ATG TTA CCA TCC AAA GCT GGA ATT> 

AAy AA V AAA AAy AAA AAA AAA AAA. AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding CCC GGA GTT TTC GAG ACA GTG GAA ATG TTA CCA TCC AAA GCT GGA ATT 

3365 3370 3375 3380 3385 3390 339S 3400 3405 
********* 
pDJCcoding TOG CGG GTG QAA TGC CTT ATT GGC GAG CAT CTA CAT GCT GGG ATG AGC 

1. p25Dcod 3370 3380 3390 3400 

£ 16902 1 TOG CGG GTG GAA TGC CTT ATT GGC GAG CAT CTA CAT GCT GGG ATG AGC> 



AAA 



pivirv^nq TOG COG GTG GAA TGC CTT ATT GQC GAG CAT CTA CAT GCT GOG ATG AGC 

3410 3415 3420 3425 3430 3435 3440 3445 3450 . 3455 
* *•* * ** * *. * * 

ruirv^^ ACA CTT TTT CTG GTG TAC TCC AAT AAG TOT CAG ACT CCC CTG GGA ATG 
I I I I I 

1. P2SDO3410 3420 3430 3440 3450 

I 16902 1 ACA CTT TTT CTG GTG TAC «gC AAT AAG TOT CAG ACT CCC CTO CGA ATG> 

1 * v » v * * <IXa AAA AAA AAA AAA AAA yyft ' AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding ACA CTT TTT CTG GTG TAC TCC AAT AAG TOT CAG ACT CCC CTG GGA ATG 

3460 346S 3470 3475 3460 3485 3490 3495 3500 
********* 
pDJCCOdlna GCT TCT GGA CAG ATT AGA GAT TTT CAG ATT ACA GCT TCA GGA CAA TAT 
III I I 

1 o2SDood 3460 3470 3480 3490 3500 

( 16902 1 GCT TCT GGA CAC ATT AGA <jlAT CAG ATT ACA OCT TCA GGA CAA TAT> 

pDJCcoding OCT TCT GGA CAC ATT AGA GAT TTT CAG ATT ACA OCT TCA GGA CAA TAT 
3505 3510 3515 3520 3525 3530 3535 3540 3545 3550 

* * * * * * 



* 



pDJCcoding GGA GAG TOO GCC CCA AAG CTO QCC AOA COT CAT VAT TCC GGA TCA ATC 

I I I I | 

1. p2SDcod 3510 3520 3530 3540 3550 

T 16902 ] GOA CAO TOO X3CC CCA AAG CTQ GCC AGA COT CAT TAT TCC GGA TCA ATO 

pDJCcoding GGA CAG TGG GCC OCA AAG CTG GCC AGA CTT CAT TAT TCC GGA TCA ATC 

3555 3560 3565 3570 3575 3580 3585 3590 3595 3600 
* * * * * ***** 

pDJCcoding AAT GCC TGG AGC AOC AAG GAG CCC TOT TCT TGG ATC AAA GOT GAC CTO 

II ||( 
1. p2SDcod 3560 3570 3580 3590 3600 

f 16902 I AAT GCC TGG AGC ACC AAG GAG CCC TOT TCT TGG ATC AAg GTg GAt CTO> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AA^p AAy A Ay AAA 

pDJCcoding AAT GCC TGG AGC ACC AAG GAG CCC TOT TCT TGG ATC AAA GOT GAC CTG 

3605 3610 3615 3620 3 625 3 630 3635 3640 3645 
* ** * * * *** 

pDJCcoding TTG GCA CCA ATG ATT ATT CAC GGC ATC AAG ACC CAG GGT GCC OGT CAG 

III I 
1- p25Dcod 3610 3620 3630 . 3640 

( 16902, ] TTG GCA CCA ATG ATT ATT GAC GGC ATC AAG ACC CAG GGT GCC COT CAG> 

pDJCcoding TTG GCA CCA ATG ATT ATT GAC GGC ATC AAG ACC CAG GGT GCC CGT GAG 

3650 3655 3660 3665 3670 3675 3680 3685 3690 3695 
* * * *** * * * * 

PDJCcoding AAG TTC TCC AGC CTC TAG ATC TCT CAA TOT ATC ATC ATG TAT AGT CTC 

111(1 
1. p25Dc3650 3660 3670 3680 3690. 

( 16902 ] AAG TTC TCC AGC CTC TAG. ATC TCT GAg TOT ATC ATC ATG TAT AGT CTt> 

AAA AAA AAA AAA AAA AAA AAA AAA AA^ AAA AAA AAA AAA AAA AAA AA^ 

pDJCcoding AAG TTC TCC AGC CTC TAG ATC TCT CAA TOT ATC ATC ATG TAT AGT CTC 

3700 3705 3710 3715 3720 3725 3730 3735 3740 
* * * *,* * * * * 

PDJCcoding GAT GGG AAG AAG TGG GAG ACT TAT GGA GGA AAT TCC ACT GGA AOC CTC 

1 I I I I 

1* p25Dcod 3700 3710 3720 3730 3740 

C 16902 ] GAT GGG AAG AAG TGG GAG ACT TAT GGA GGA AAT TCC ACT GGA AGC tTa> 



pDJCcoding GAT GGG AAG AAG TGG GAG ACT TAT GGA GGA AAT TCC ACT GGA ACC CTC 
3745 3750 3755 3760 3765 3770 3775 3780 3785 3790 

* * * * * * *** * 

pDJCcoding ATG GTC TTC TOT GGC AAT GTG GAT TCA TCT GGG ATA AAA CAC AAT ATT 

1 I I I I 

1. p25Dcod 3750 3760 3770 3780 3790 

t 16902 ] ATG GTC TTC TOT GGC AAT GTG GAT TCA TCT GGG ATA AAA GAC AAT ATT> 

pDJCcoding ATG GTC TTC TOT GGC AAT GTG GAT TCA TCT GGG ATA AAA GAC AAT ATT 



3795 3800 3805 3810 3815 3820 3825 3830 3835 3840 
* * *.* * * * * ** 

PDJCcoding TTC AAG CCT GCA ATT ATT GCT CQA TAG ATC OGT TTG GAC CCA ACT GAT 

1 I I I I 



1. p2SDcod 3800 3810 3820 3830 3840 

[ 16902 1 TTt AAC OCT OCA ATT ATT OCT OQA TAC ATC COT TTG CAC CCA ACT CAT> 

AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding TTC AAC CCT CCA ATT ATT OCT OQA TAC ATC COT TTQ CAC CCA ACT CAT 



3845 3850 3855 3860 3865 3870 3875 3880 3885 
* * * * * * * * * * 

pDJCcoding TAT AGC ATT CGC AGO ACT CTT CGC ATG GAG TTG ATG GGC TOT GAT TTA 

I i I I 

1- p25Dcod 3850 3860 3870 3880 

( 16902 ] TAT AGC ATT CGC AGC ACT CTT CGC ATG GAG TTG ATG GGC TOT GAT TTA> 

pDJCcoding TAT AGC ATT CGC AGC ACT CTT CGC ATG GAG TTG ATG GGC TOT GAT TTA 

3890 3895 3900 3905 3910 3915 3920 3925 3930 3935 
* .* * * * * ** * * * 

.pDJCcoding AAT AGT TGC AGC ATG CCA TTG GGA ATG GAG AGT AAA GCA ATA TCA GAT * 

I I I 1 1 

1. p25Dc3890 3900 3910 3920 3930 

( 16902 J AAT AGT TGC AGC ATG CCA TTG GGA ATG GAG AGT AAA GCA ATA TCA GAT> 

pDJCcoding AAT AGT TGC AGC ATG CCA TTG GGA ATG GAG AGT AAA GCA ATA TCA GAT 

3940 3945 3950 3955 3960 3965 3970 3975 3980 
* * * * * * * * * 

PDJCcoding GCA CAG ATT ACT GCT TCA TCC TAC TTT AOC AAT ATG TTT GCC ACC TGG 
I I 1 I 1 

!• p25Dcod 3940 3950 3960 3970 3980 

t 16902 ] GCA CAG ATT ACT GCT TCA TCC TAC TTT AOC AAT ATG TTT GCC AOC TGG> 

pDJCcoding GCA CAG ATT ACT GCT TCA TCC TAC TTT ACC AAT ATG TTT GCC AOC TGG 

3985 3990 3995 4000 4005 4010 4015 4020 4025 4030 
* * * * * * * * * * 

pDJCcoding TCT OCT TCA AAA GCT OQA CTA CAC CTA CAA GOG AGG AGT AAT GCC TGG 

1. ©25Dcod 3990 4000 4010 4020 403 i* 

C 16902 1 TCT OCT TCA AAA OCT OGA CTt CAC CTC CAA GOG AGG AGT AAT GOC TGG> 

i J TBW* AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AM 

pDJCcoding TCT OCT TCA AAA GCT OGA CTA CAC CTA CAA GGG AGG AGT AAT GOC TOG 

4035 4040 4045 4050 40S5 4060 4065 4070 4075 4080 
********** 

pDJCcoding AGA OCT CAA GTT AAC AAT CCA AAA GAG TOG CTG CAA GTG GAC TTC CAG 

( | J 1 

1. p2SDcod 4040 4050 4060 4070 <J£L 

t 16902 1 AGA CCT CAg GTg AAt AAT CCA AAA GAG TGG CTG CAA GTG GAC TTC CAG> 

l J AAA AA^ AAy AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding AGA OCT CAA GTT AAC AAT CCA AAA GAG TGG CTG CAA GTG GAC TTC CAG 

4085 4090 4095 4100 4105 4110 4115 4120 4125 
********* 
PDJCcoding AAG ACA ATG AAA GTC ACA GGA GTA ACT ACT CAG GGA GTA AAA TCT CTG 

| I I I 

1. p25Dcod 4090 4100 .4110 4120 

C 16902 ] AAG ACA ATG AAA GTC ACA OGA GTA ACT ACT CAG GGA GTA AAA TCT CTG> 



fife.- 5 O 



pDJCcoding AAG ACA ATO AAA QTC ACA GGA OTA ACT ACT CftO GGA OTA AAA TCT CTG 

4130 4135 4140 4145 4150 4155 4160 4165 4170 4175 
********** 

pDJCcoding CTT AOC TCT ATO TAC QTG AAO GAG TTC CTC ATA TCG TCG TOG CAA GAT 
II (II 
1. p25Dc4130 4140 4150 4160 4170 

t 16902 ] CTT AOC age ATO TAt GTG AAO GAG TTC CTC ATc TCc age agt CAA GAT> 

AAA AAA YVV A A^ AAA AAA AAA AAA AAA AAy AAy VW VW AAA 

pDJCcoding CTT ACC TCT ATO TAC GTG AAG GAG TTC CTC ATA TCG TOG TOG CAA GAT 

4180 4185 4190 4195 4200 4205 4210 4215 4220 
* * * ** *.** * 

pDJCcoding GGC CAT CAG TOG ACT CTC TTT TTT CAA AAT GGC AAA GTA AAA GOT TTC 
I I 1 I I 

1. p25Dcod 4180 4190 4200 , 4210 4220 

[ 16902 ] GGC CAT CAG TOG ACT CTC TTT TTT CAg AAT GGC AAA GTA AAg GOT TTt> 

AAA AAA AAA AAA AAA AAA AAA AAA AAy AAA AAA AAA AAA AA*y. AAA AA^ 

pDJCcoding GGC CAT CAG TOG ACT CTC TTT TTT CAA AAT GGC AAA GTA AAA GOT TTC 

« • 

4225 4230 4235 4240 4245 4250 4255 4260 4265 4270 
* . * * ** * ** * * 

pDJCcoding CAG GGA AAT CAA GAC TCC TTC ACA OCT GTC GTG AAC TCT CTA GAG CCA 

I I I I I 

1. p25Dcod 4230 4240 4250 4260 4270 

( 16902 ] CAG GGA AAT CAA GAC TCC TTC ACA CCT GTg GTG AAC TCT CTA GAC CCA> 

AAA AAA AAA AAA AAA AAA AAA AAA AAA AAy A'AA AAA AAA AAA AAA AAA 

pDJbcoding CAG GGA AAT CAA GAC TCC TTC ACA OCT GTC GTG AAC TCT CTA GAC CCA 

4275 4280 4285 4290 4295 4300 4305 4310 4315 4320 
********** 
pDJCcoding COG OTA CTC ACT CGC TAC CTT CGA ATT? CAC CCC CAG AGT TOG GTG CAG 

1 I I ~ I 1 

1- p25Dcod 4280 4290 4300 4310 4320 

t 16902 ] COG TEA CTg ACT CGC TAC CTT CGA ATT GAC CCC CAG AST TOG GXG CAO 

AAA AAA AAy AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 

pDJCcoding COG TTA CTC ACT CGC TAC CTT CGA ATT CAC CCC CAG AGT TOG GTG CAC 

4325 4330 4335 4340 4345 4350 4355.. 4360 4365 
* * * * * . * * * 

pDJCcoding CAG ATT GOC CTG AGG ATO GAG GOT CTG GGC TOC GAG GGA CAG GAC CTC 

1. p2Sficdd 4330 4340 4350 - 4360 

£ 16902 1 CAG ATT GOC CTG AGG ATO GAG GOT CTG GGC TOC GAG GCA CAG GAC CTC> 

pDJCcoding CAG ATT GOC CTG AGG ATO GAG GOT CTG GGC TOC GAG GCA CAG GAC CTC 

4370 
* 

pDJCcoding TAC TOA 

1. p2SDc4370 
[ 16902 ] TAC TOA> 

AAA AAA 

pDJCcoding TAC TOA . 
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Declaration, Petition and Power of Attorney for Patent Application 
As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name, 

I believe I am the original, first and sole inventor (if only one name is listed below) or an 
original, first and joint inventor (if plural names are listed below) of the subject matter which 
is claimed and for which a patent is sought on the invention entitled 

NOVEL VECTORS AND GENES EXHIBITING INCREASED EXPRESSION 

the specification of which 

(check one) 

_ is attached hereto. 

X was filed on December 4. 1998 as 

Application Serial No, 09/205,817 

and was amended on . 

(if applicable) 

I do not know and do not believe that the subject matter of this application was known or 
used by others in the United States or patented or described in a printed publication in any 
country before my invention thereof, or patented or described in a printed publication in any 
country or in public use or on sale in the United States more than one year prior to the date of 
this application, or first patented or caused to be patented or made the subject of an inventor's 
certificate by me or my legal representatives or assigns in a country foreign to the United 
States prior to the date of this application on an application filed more than twelve months 
(six months if this application is for a design) before the filing of this application; and I 
acknowledge my duty to disclose information of which I am aware which is material to the 
examination of this application, that no application for patent or inventor's certificate on the 
subject matter of this application has been filed by me or my representatives or assigns in any 
country foreign to the United States, except those identified below, and that I have reviewed 
and understand the contents of the specification, including the claims as amended by any 
amendment referred to herein. 

I acknowledge the duty to disclose to the Office all information known to me to be material 
to patentability as defined in Title 37, Code of Federal Regulations, §1.56. 




CLAIM OF BENEFIT OF EARLIER FOREIGN APPLICATION(S) 



I hereby claim priority benefits under Title 35, United States Code, §1 19 of any foreign 
application(s) for patent or inventors certificate listed below, and have also identified below 
any foreign application(s) for patent or inventor's certificate filed by me on the same subject 
matter having a filing date before that of the application(s) from which priority is claimed. 
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X no such applications have been filed. 
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CLAIM FOR BENEFIT OF U.S. PROVISIONAL APPLICATION(S) 



I hereby claim the benefit under 35 U.S. C. §1 19(e) of any United States provisional 
application(s) listed below. 



60/07 L596 January 16, 1998 

(Application Serial No.) (Filing Date) 



60/067.614 Decembers, 1997 

(Application Serial No.) (Filing Date) 



CLAIM FOR BENEFIT OF EARLIER U.S./PCT APPLICATION(S) 



I hereby claim the benefit under Title 35, United States Code, §120 of any earlier United States 
application(s) or PCT international application(s) designating the United States listed below 
and, insofar as the subject matter of each of the claims of this application is not disclosed in the 
earlier application(s) in the manner provided by the first paragraph of Title 35, United States 
Code, §1 12, 1 acknowledge the duty to disclose to the Office all information known to me to be 
material to patentability as defined in Title 37, Code of Federal Regulations, §1.56 which 
became available between the filing date(s) of the earlier application(s) and the national or 
PCT international filing date of this application. As to subject matter of this application which 
is common to my earlier applications), if any, described below, I do not know and do not 
believe that the same was known or used by others in the United States or patented or 
described in a printed publication in any country before my invention thereof, or patented or 
described in a printed publication in any country or in public use or on sale in the United States 
more than one year prior to the date(s) of said earlier application(s), or first patented or caused 
to be patented or made the subject of an inventor's certificate by me or my legal representatives 
or assigns in a country foreign to the United States prior to the date(s) of said earlier 
application(s) on an application filed more than twelve months (six months if this application 
is for a design) before the filing of said earlier applications); and I acknowledge that no 
application for patent or inventor's certificate on said subject matter has been filed by me or my 
representatives or assigns in any country foreign to the United States except those identified 
herein. 



PCT/US98/25354 November 25, 1998 Pending 

(Application Serial No.) (Filing Date) (Status) 

(patented,pending,aband.) 
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POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorneys 
and/or agents to prosecute this application and transact all business in the Patent and 
Trademark Office connected therewith. 



W. Hugo Liepmann Reg. No. 20,407 

James E. Cockfield Reg. No. 1 9, 1 62 

Thomas V. Smurzynski Reg. No. 24,798 

Ralph A. Loren Reg. No. 29,325 

Giulio A. DeConti, Jr. Reg. No. 3 1 ,503 

Ann Lamport Hammitte Reg. No. 34,858 

Elizabeth A. Hanley Reg. No. 33,505 

Amy E. Mandragouras Reg. No. 36,207 

John V. Bianco Reg. No. 36,748 

Anthony A. Laurentano Reg. No. 38,220 

Jane E. Remillard Reg. No. 38,872 

Jeremiah Lynch Reg. No. 17,425 



Lawrence E. Monks Reg. No. 34,224 

David A. Lane, Jr. Reg. No. 39,26 1 

Catherine J. Kara Reg. No. 4 1 , 1 06 

Scott D. Rothenberger Reg. No. 4 1 ,277 

Linda M. Chinn Reg. No. 3 1 ,240 

Kevin J. Canning Reg. No, 35,470 

Faustino A. Lichauco Reg. No. 41,942 

C. Eric Schulman Reg. No. 43,350 

Jeanne M. DiGiorgio Reg. No. 41,710 

Megan E. Williams Reg. No. 43,270 

Nicholas P. Triano III Reg. No. 36,397 

Peter C. Lauro Reg. No. 32,360 

Reza Mollaaghababa Reg. No. P43,8 1 0 

Timothy J, Douros Reg. No. 4 1 ,7 1 6 
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Wherefore I petition that letters patent be granted to me for the invention or discovery described and 
claimed in the attached specification and claims, and hereby subscribe my name to said specification 
and claims and to the foregoing declaration, power of attorney, and this petition. 

I hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these statements 
were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under Section 1001 of Title 18 of the United States Code and that 
such willful false statements may jeopardize the validity of the application or any patent issued 
thereon. 
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United States of America 



Post Office Address (if different) 
Same as Above 



-5- 



Full name of second inventor, if any 
Jose E. N. Gonzales 



Invents sjgn^afe ~77 ~? Date 




Residence 

7546 Dancy Road, San Diego, Ca 92126 



Citizenship 

United States of America 
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Customer Number: 000959 

Attorney's 
Docket 

Number TTI-180 



Declaration, Petition and Power of Attorney for Patent Application 
As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name, 

I believe I am the original, first and sole inventor (if only one name is listed below) or an 
original, first and joint inventor (if plural names are listed below) of the subject matter which 
is claimed and for which a patent is sought on the invention entitled 

NOVEL VECTORS AND GENES EXHIBITING INCREASED EXPRESSION 

the specification of which 

(check one) 

_ is attached hereto. 

X was filed on December 4, 1998 as 

Application Serial No. 09/205,817 

and was amended on . 

(if applicable) 

I do not know and do not believe that the subject matter of this application was known or 
used by others in the United States or patented or described in a printed publication in any 
country before my invention thereof, or patented or described in a printed publication in any 
country or in public use or on sale in the United States more than one year prior to the date of 
this application, or first patented or caused to be patented or made the subject of an inventor's 
certificate by me or my legal representatives or assigns in a country foreign to the United 
States prior to the date of this application on an application filed more than twelve months 
(six months if this application is for a design) before the filing of this application; and I 
acknowledge my duty to disclose information of which I am aware which is material to the 
examination of this application, that no application for patent or inventor's certificate on the 
subject matter of this application has been filed by me or my representatives or assigns in any 
country foreign to the United States, except those identified below, and that I have reviewed 
and understand the contents of the specification, including the claims as amended by any 
amendment referred to herein. 

I acknowledge the duty to disclose to the Office all information known to me to be material 
to patentability as defined in Title 37, Code of Federal Regulations, §L56. 



CLAIM OF BENEFIT OF EARLIER FOREIGN APPLICATION(S) 



I hereby claim priority benefits under Title 35, United States Code, §1 19 of any foreign 
application(s) for patent or inventor's certificate listed below, and have also identified below 
any foreign application(s) for patent or inventor's certificate filed by me on the same subject 
matter having a filing date before that of the application(s) from which priority is claimed. 

Check one: 

X no such applications have been filed. 

_ such applications have been filed as follows 



EARLIEST FOREIGN APPLICATION(S), IF ANY, FILED WITHIN 12 MONTHS 
(6 MONTHS FOR DESIGN) PRIOR TO THIS U.S. APPLICATION 



Country 


Application Number 


Date of Filing 
(month,day,year) 


Priority Claimed 
Under 35 USC 119 








_ Yes No _ 








_ Yes No _ 








_ Yes No _ 








_ Yes No _ 








_ Yes No _ 



ALL FOREIGN APPLICATION(S), IF ANY FILED MORE THAN 12 MONTHS 
(6 MONTHS FOR DESIGN) PRIOR TO THIS U.S. APPLICATION 
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CLAIM FOR BENEFIT OF U.S. PROVISIONAL APPLICATION(S) 



I hereby claim the benefit under 35 U.S.C. §1 19(e) of any United States provisional 
application(s) listed below. 



60/071.596 January 16. 1998 

(Application Serial No.) (Filing Date) 



60/067,614 Decembers, 1997 

(Application Serial No.) (Filing Date) 



CLAIM FOR BENEFIT OF EARLIER U.S./PCT APPLICATION(S) 



I hereby claim the benefit under Title 35, United States Code, §120 of any earlier United States 
application(s) or PCT international application(s) designating the United States listed below 
and, insofar as the subject matter of each of the claims of this application is not disclosed in the 
earlier application(s) in the manner provided by the first paragraph of Title 35, United States 
Code, §1 12, 1 acknowledge the duty to disclose to the Office all information known to me to be 
material to patentability as defined in Title 37, Code of Federal Regulations, §1.56 which 
became available between the filing date(s) of the earlier application(s) and the national or 
PCT international filing date of this application. As to subject matter of this application which 
is common to my earlier application(s), if any, described below, I do not know and do not 
believe that the same was known or used by others in the United States or patented or 
described in a printed publication in any country before my invention thereof, or patented or 
described in a printed publication in any country or in public use or on sale in the United States 
more than one year prior to the date(s) of said earlier application(s), or first patented or caused 
to be patented or made the subject of an inventor's certificate by me or my legal representatives 
or assigns in a country foreign to the United States prior to the date(s) of said earlier 
applications) on an application filed more than twelve months (six months if this application 
is for a design) before the filing of said earlier application(s); and I acknowledge that no 
application for patent or inventor's certificate on said subject matter has been filed by me or my 
representatives or assigns in any country foreign to the United States except those identified 
herein. 



PCT/US98/25354 November 25. 1998 Pending 

(Application Serial No.) (Filing Date) (Status) 

(patented,pending,aband.) 



(Application Serial No.) (Filing Date) (Status) 

(patented,pending,aband.) 



POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorneys 
and/or agents to prosecute this application and transact all business in the Patent and 
Trademark Office connected therewith. 



W. Hugo Liepmann Reg. No. 20,407 

James E. Cockfield Reg. No. 19,162 

Thomas V. Smurzynski Reg. No. 24,798 

Ralph A. Loren Reg. No. 29,325 

Giulio A. DeConti, Jr. Reg. No. 3 1 ,503 

Ann Lamport Hammitte Reg. No. 34,858 

Elizabeth A. Hanley Reg. No. 33,505 

Amy E. Mandragouras Reg. No. 36,207 

John V. Bianco Reg. No. 36,748 

Anthony A. Laurentano Reg. No. 38,220 

Jane E. Remillard Reg. No. 38,872 

Jeremiah Lynch Reg. No. 17,425 



Lawrence E. Monks Reg. No. 34,224 

David A. Lane, Jr. Reg. No. 39,261 

Catherine J. Kara Reg. No. 4 1 , 1 06 

Scott D. Rothenberger Reg. No. 41,277 

Linda M Chinn Reg. No. 3 1 ,240 

Kevin J. Canning Reg. No. 35,470 

Faustino A. Lichauco Reg. No. 4 1 ,942 

C. Eric Schulman Reg. No. 43,350 

Jeanne M. DiGiorgio Reg. No. 4 1 ,7 1 0 

Megan E. Williams Reg. No. 43,270 

Nicholas P. Triano III Reg. No. 36,397 

Peter C. Lauro Reg. No. 32,360 

Reza Mollaaghababa Reg. No. P43,810 

Timothy J. Douros Reg. No. 41,716 



Send Correspondence to Giulio A. DeConti. Jr. at Customer Number: 000959 whose address is: 

Lahive & Cockfield, LLP, 28 State Street, Boston. MA 02109 

Direct Telephone Calls to: (name and telephone number) 

Jane E. Remillard (617) 227-7400 

Wherefore I petition that letters patent be granted to me for the invention or discovery described and 
claimed in the attached specification and claims, and hereby subscribe my name to said specification 
and claims and to the foregoing declaration, power of attorney, and this petition. 

I hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these statements 
were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under Section 1001 of Title 18 of the United States Code and that 
such willful false statements may jeopardize the validity of the application or any patent issued 
thereon. 



Full name of sole or first inventor 




Charles R. Ill 




Inventor's signature 


Date 


Residence 




1098 Oceanic Drive, Encinitas, CA 92024 




Citizenship 




United States of America 




Post Office Address (if different) 
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Full name of second inventor, if any 




T 1 — 1 x T 1 

Jose E. N. Gonzales 




Inventor's signature 


Date 


Residence 




040 uancy Koaa, ban Diego, ca yzizo 




Citizenship 




United States of America 




Post Office Address (if different) 




Same as Above 





Full name of third inventor, if any 




Claire Q. Yang 




Inventor's signature 


Date 


Residence ; 




7707 Sitro Musica, Carlsbad, CA 92009 




Citizenship 




United States of America 




Post Office Address (if different) 




Same as Above 





Full name of fourth inventor, if any 
Scott Bidlingmaier 



Inventor's signature ^ , „ Date 



Residence 

433 Edgewood Ave, New Haven, CT 065 1 1 



Citizenship 

United States of America 



Post Office Address (if different) 
Same as Above 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 111, Charles R. et al . 

(ii) TITLE OF INVENTION: NOVEL VECTORS AND GENES EXHIBIT 

INCREASED EXPRESSION 

(iii) NUMBER OF SEQUENCES: 11 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: LAHIVE & COCKFIELD, LLP 

(B) STREET: 28 STATE STREET 

(C) CITY: BOSTON 

(D) STATE: MASSACHUSETTS 

(E) COUNTRY: US 

(F) ZIP: 02109 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 04 DECEMBER 1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/067,614 

(B) FILING DATE: 05 DECEMBER 1997 

(A) APPLICATION NUMBER: US 60/071,596 

(B) FILING DATE: 16 JANUARY 1998 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: REMILLARD, JANE E. 

(B) REGISTRATION NUMBER: 38,872 

(C) REFERENCE /DOCKET NUMBER: TTI-180 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617)227-7 4 00 

(B) TELEFAX: (617)742-4214 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4374 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1.. .4374 



1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



ATG GAA ATA 
Met Glu lie 
1 

TGC TTT AGT 

Cys Phe Ser 



TGG GAC TAT 
Trp Asp Tyr 
35 

TTT CCT CCT 
Phe Pro Pro 
50 

TAC AAA AAG 
Tyr Lys Lys 
65 

GCT AAG CCA 
Ala Lys Pro 

GCT GAG GTT 
Ala Glu Val 



CAT CCT GTC 
His Pro Val 
115 

GAG GGA GCT 
Glu Gly Ala 
130 

GAT AAA GTC 
Asp Lys Val 
145 

AAA GAG AAT 
Lys Glu Asn 

TAT CTT TCT 
Tyr Leu Ser 



GGA GCC CTA 
Gly Ala Leu 
195 

CAG ACC TTG 
Gin Thr Leu 
210 



GAG CTC TCC 
Glu Leu Ser 
5 

GCC ACC AGA 
Ala Thr Arg 
20 

ATG CAA AGT 
Met Gin Ser 



CGC GTG CCA 
Arg Val Pro 



ACT CTG TTT 
Thr Leu Phe 
70 

AGG CCA CCC 
Arg Pro Pro 
85 

TAT GAT ACA 
Tyr Asp Thr 
100 

TCC CTT CAT 
Ser Leu His 



GAA TAT GAT 
Glu Tyr Asp 



TTC CCT GGT 
Phe Pro Gly 
150 

GGT CCA ATG 
Gly Pro Met 
165 

CAT GTG GAC 
His Val Asp 
180 

CTA GTA TGT 
Leu Val Cys 



CAC AAA TTT 
His Lys Phe 



ACC TGC TTC 
Thr Cys Phe 



AGA TAC TAC 
Arg Tyr Tyr 
25 

GAT CTC GGA 
Asp Leu Gly 
40 

AAA TCT TTT 
Lys Ser Phe 
55 

GTA GAA TTC 

Val Glu Phe 



TGG ATG GGT 
Trp Met Gly 

GTG GTC ATT 
Val Val lie 
105 

GCT GTT GGT 
Ala Val Gly 
120 

GAT CAG ACC 
Asp Gin Thr 
135 

GGA AGC CAT 
Gly Ser His 



GCC TCC GAC 
Ala Ser Asp 



CTG GTT AAA 
Leu Val Lys 
185 

AGA GAA GGG 
Arg Glu Gly 
200 

ATA CTA CTT 
lie Leu Leu 
215 



TTT CTG TGC 
Phe Leu Cys 
10 

CTG GGT GCA 
Leu Gly Ala 



GAG CTG CCT 
Glu Leu Pro 



CCA TTC AAC 
Pro Phe Asn 
60 

ACG GTT CAC 
Thr Val His 
75 

CTG CTA GGT 
Leu Leu Gly 
90 

ACA CTT AAG 
Thr Leu Lys 



GTA TCC TAC 
Val Ser Tyr 



AGT CAA AGG 
Ser Gin Arg 
140 

ACA TAT GTC 
Thr Tyr Val 
155 

CCA CTG TGC 
Pro Leu Cys 
170 

GAC TTG AAT 
Asp Leu Asn 

AGT CTG GCC 
Ser Leu Ala 



TTT GCT GTA 
Phe Ala Val 
220 



CTT TTG CGA 
Leu Leu Arg 
15 

GTG GAA CTG 
Val Glu Leu 
30 

GTG GAC GCA 
Val Asp Ala 
45 

ACC TCA GTC 
Thr Ser Val 



CTT TTC AAC 
Leu Phe Asn 



CCT ACC ATC 
Pro Thr lie 
95 

AAC ATG GCT 
Asn Met Ala 
110 

TGG AAA GCT 
Trp Lys Ala 
125 

GAG AAA GAA 
Glu Lys Glu 



TGG CAA GTC 
Trp Gin Val 



CTT ACC TAC 
Leu Thr Tyr 
175 

TCA GGC CTC 
Ser Gly Leu 
190 

AAG GAA AAG 
Lys Glu Lys 
205 

TTT GAT GAA 
Phe Asp Glu 



TTC 4 8 

Phe 



TCA 96 
Ser 



AGA 144 
Arg 



GTG 192 
Val 



ATC 24 0 

He 
80 

CAA 288 
Gin 



TCC 336 
Ser 



TCT 384 
Ser 



GAT 4 32 

Asp 

CTG 480 

Leu 

160 

TCA 528 
Ser 



ATT 576 
He 



ACA 62 4 

Thr 



GGG 67 2 

Gly 
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AAA AGT TGG 
Lys Ser Trp 
225 

GCT GCA TCT 
Ala Ala Ser 



GTA AAC AGG 
Val Asn Arg 



TAT TGG CAT 
Tyr Trp His 
275 

TTC CTC GAA 
Phe Leu Glu 
290 

TTG GAA ATC 
Leu Glu lie 
305 

GAC CTT GGA 
Asp Leu Gly 



GAT GGC ATG 
Asp Gly Met 

CAA CTA CGA 
Gin Leu Arg 

355 

CTT ACC GAT 
Leu Thr Asp 
370 

CCT TCC TTT 

Pro Ser Phe 
385 

TGG GTA CAT 

Trp Val His 



TTA GTC CTC 
Leu Val Leu 



AAT GGC CCT 
Asn Gly Pro 
435 



CAC TCA GAA 
His Ser Glu 

230 

GCT CGG GCC 
Ala Arg Ala 
245 

AGC CTG CCA 
Ser Leu Pro 
260 

GTT ATA GGA 
Val He Gly 



GGA CAC ACA 
Gly His Thr 



TCG CCA ATA 
Ser Pro lie 
310 

CAG TTT CTA 
Gin Phe Leu 
325 

GAA GCT TAT 
Glu Ala Tyr 
340 

ATG AAA AAT 
Met Lys Asn 

TCT GAA ATG 
Ser Glu Met 



ATC CAA ATT 
He Gin He 
390 

TAC ATT GCT 
Tyr He Ala 
405 

GCC CCC GAT 
Ala Pro Asp 
420 

CAG CGG ATT 
Gin Arg He 



ACA AAG AAC 
Thr Lys Asn 



TGG CCT AAA 
Trp Pro Lys 



GGA CTG ATT 
Gly Leu He 
265 

ATG GGC ACC 
Met Gly Thr 
280 

TTT CTT GTT 
Phe Leu Val 
295 

ACT TTC CTT 
Thr Phe Leu 



CTG TTT TGT 
Leu Phe Cys 



GTC AAA GTA 
Val Lys Val 
345 

AAT GAA GAA 
Asn Glu Glu 
360 

GAT GTG GTC 
Asp Val Val 
375 

CGC TCA GTT 
Arg Ser Val 



GCT GAA GAG 
Ala Glu Glu 



GAC AGA AGT 
Asp Arg Ser 
425 

GGA AGG AAG 
Gly Arg Lys 
440 



TCC CTC ATG 
Ser Leu Met 

235 

ATG CAC ACA 
Met His Thr 
250 

GGA TGC CAC 
Gly Cys His 



ACT CCT GAA 
Thr Pro Glu 



AGA AAC CAT 
Arg Asn His 
300 

ACT GCT CAA 
Thr Ala Gin 
315 

CAT ATC TCT 
His He Ser 
330 

GAC AGC TGT 
Asp Ser Cys 



GCG GAA GAC 
Ala Glu Asp 

AGA TTT GAT 
Arg Phe Asp 
'380 

GCC AAG AAG 
Ala Lys Lys 
395 

GAG GAC TGG 
Glu Asp Trp 
410 

TAT AAA AGT 
Tyr Lys Ser 



TAC AAA AAA 
Tyr Lys Lys 



CAA GAT AGG 
Gin Asp Arg 



GTC AAT GGT 
Val Asn Gly 
255 

AGG AAA TCA 
Arg Lys Ser 
270 

GTG CAC TCA 
Val His Ser 
285 

CGC CAG GCG 
Arg Gin Ala 



ACA CTC CTC 
Thr Leu Leu 



TCC CAC CAA 
Ser His Gin 
335 

CCA GAG GAA 
Pro Glu Glu 
350 

TAT GAT GAT 
Tyr Asp Asp 
365 

GAT GAC AAC 
Asp Asp Asn 

CAT CCT AAA 
His Pro Lys 



GAC TAT GCT 
Asp Tyr Ala 
415 

CAA TAT TTG 
Gin Tyr Leu 
430 

GTC CGA TTT 
Val Arg Phe 
445 



GAT 720 

Asp 

240 

TAT 7 68 

Tyr 



GTC 816 
Val 



ATA 8 64 

He 



TCC 912 
Ser 



ATG 960 

Met 

320 

CAT 1008 
His 



CCC 1056 
Pro 



GAT 1104 
Asp 

TCT 1152 
Ser 



ACT 1200 

Thr 

400 

CCC 1248 
Pro 



AAC 1296 
Asn 



ATG 1344 
Met 



3 



GCA TAC ACA 
Ala Tyr Thr 
450 

TCA GGA ATC 
Ser Gly lie 
465 

CTC ATT ATA 
Leu lie lie 



CAC GGA ATC 
His Gly lie 



GGA GTA AAA 
Gly Val Lys 
515 

AAA TAT AAA 
Lys Tyr Lys 

530 

CCT CGG TGC 
Pro Arg Cys 
545 

GAT CTA GCT 
Asp Leu Ala 



TCT GTA GAT 
Ser Val Asp 



ATC CTG TTT 
lie Leu Phe 
595 

AAT ATA CAA 
Asn lie Gin 
610 

CCA GAG TTC 
Pro Glu Phe 
625 

TTC GAT AGT 
Phe Asp Ser 

TAC ATT CTA 
Tyr lie Leu 



GAT GAA ACC 
Asp Glu Thr 



TTG GGA CCT 
Leu Gly Pro 
470 

TTT AAG AAT 
Phe Lys Asn 
485 

ACC GAT GTC 
Thr Asp Val 
500 

CAT TTG AAG 
His Leu Lys 

TGG ACA GTG 
Trp Thr Val 



CTG ACC CGC 
Leu Thr Arg 
550 

TCA GGA CTC 
Ser Gly Leu 
565 

CAA AGA GGA 
Gin Arg Gly 
580 

TCT GTA TTT 
Ser Val Phe 



CGC TTT CTC 
Arg Phe Leu 



CAA GCC TCC 
Gin Ala Ser 
630 

TTG CAG TTG 
Leu Gin Leu 
645 

AGC ATT GGA 
Ser lie Gly 
660 



TTT AAG ACT 
Phe Lys Thr 
455 

TTA CTT TAT 
Leu Leu Tyr 



CAA GCA AGC 
Gin Ala Ser 



CGT CCT TTG 
Arg Pro Leu 
505 

GAT TTT CCA 
Asp Phe Pro 
520 

ACT GTA GAA 
Thr Val Glu 

535 

TAT TAC TCT 
Tyr Tyr Ser 



ATT GGC CCT 
lie Gly Pro 



AAC CAG ATA 
Asn Gin lie 
585 

GAT GAG AAC 
Asp Glu Asn 
600 

CCC AAT CCC 
Pro Asn Pro 
615 

AAC ATC ATG 
Asn lie Met 



TCA GTT TGT 
Ser Val Cys 



GCA CAG ACT 
Ala Gin Thr 
665 



CGT GAA GCT 
Arg Glu Ala 
460 

GGG GAA GTT 
Gly Glu Val 
475 

AGA CCA TAT 
Arg Pro Tyr 
490 

TAT TCA CGC 
Tyr Ser Arg 



ATT CTG CCC 
lie Leu Pro 



GAT GGG CCA 
Asp Gly Pro 
540 

AGT TTC GTC 
Ser Phe Val 
555 

CTC CTC ATC 
Leu Leu lie 
570 

ATG TCA GAC 
Met Ser Asp 

CGA AGC TGG 
Arg Ser Trp 



GCT GGA GTG 
Ala Gly Val 
620 

CAC AGC ATC 
His Ser lie 
635 

TTG CAT GAA 
Leu His Glu 
650 

GAC TTC CTT 
Asp Phe Leu 



ATT CAG CAT 
lie Gin His 



GGA GAC ACA 
Gly Asp Thr 



AAC ATC TAC 
Asn lie Tyr 
495 

AGA TTA CCA 
Arg Leu Pro 
510 

GGA GAA ATA 
Gly Glu lie 
525 

ACT AAA TCA 
Thr Lys Ser 



AAT ATG GAG 
Asn Met Glu 



TGC TAC AAA 
Cys Tyr Lys 
575 

AAG AGG AAT 
Lys Arg Asn 
590 

TAC CTC ACA 
Tyr Leu Thr 
605 

CAG CTT GAG 
Gin Leu Glu 



AAT GGC TAT 
Asn Gly Tyr 



GTA GCA TAC 
Val Ala Tyr 
655 

TCT GTC TTC 
Ser Val Phe 
670 



GAA 1392 
Glu 



CTG 1440 

Leu 

480 

CCT 1488 
Pro 



AAA 1536 
Lys 



TTC 1584 
Phe 



GAT 1632 
Asp 



AGA 168 0 

Arg 

560 

GAA 1728 
Glu 



GTC 1776 
Val 



GAG 1824 
Glu 



GAT 1872 
Asp 

GTT 1920 

Val 

640 

TGG 1968 
Trp 



TTC 2016 
Phe 



4 



TCT GGA TAT 
Ser Gly Tyr 
675 

CTA TTC CCA 
Leu Phe Pro 
690 

GGA CTA TGG 
Gly Leu Trp 
705 

ATG ACC GCC 
Met Thr Ala 



TAT TAC GAG 
Tyr Tyr Glu 



AAC AAT GCC 
Asn Asn Ala 
755 

AAA CGC CAT 
Lys Arg His 
770 

GAG GAA ATT 
Glu Glu lie 
785 

GAT TTC GAC 
Asp Phe Asp 

CAA AAG AAA 
Gin Lys Lys 

GAT TAT GGG 
Asp Tyr Gly 

835 

AGT GGC AGT 
Ser Gly Ser 
850 

GAT GGC TCC 

Asp Gly Ser 
865 

TTG GGA CTC 

Leu Gly Leu 



ACC TTC AAA 
Thr Phe Lys 



TTC TCC GGA 
Phe Ser Gly 



ATT CTG GGG 

lie Leu Gly 
710 

TTA CTG AAA 
Leu Leu Lys 
725 

GAC AGT TAT 
Asp Ser Tyr 
740 

ATT GAA CCA 
lie Glu Pro 



CAA CGG GAA 
Gin Arg Glu 



GAC TAT GAT 
Asp Tyr Asp 
790 

ATT TAT GAT 
lie Tyr Asp 
805 

ACA CGA CAC 
Thr Arg His 
820 

ATG AGT AGC 
Met Ser Ser 



GTC CCT CAG 
Val Pro Gin 



TTT ACT CAA 
Phe Thr Gin 
870 

CTG GGG CCA 
Leu Gly Pro 
885 



CAC AAA ATG 
His Lys Met 
680 

GAA ACT GTC 
Glu Thr Val 
695 

TGC CAC AAC 
Cys His Asn 



GTT TCC AGT 
Val Ser Ser 



GAA GAT ATT 
Glu Asp lie 
745 

AGA AGC TTC 
Arg Ser Phe 
760 

ATA ACT CGT 
lie Thr Arg 
775 

GAT ACC ATA 
Asp Thr lie 



GAG GAT GAA 
Glu Asp Glu 



TAT TTT ATT 

Tyr Phe lie 
825 

TCC CCA CAT 
Ser Pro His 
840 

TTC AAG AAA 
Phe Lys Lys 
855 

CCC TTA TAC 
Pro Leu Tyr 



TAT ATA AGA 
Tyr lie Arg 



GTC TAT GAA 
Val Tyr Glu 



TTC ATG TCG 
Phe Met Ser 
700 

TCA GAC TTT 
Ser Asp Phe 
715 

TGT GAC AAG 
Cys Asp Lys 
730 

TCA GCA TAC 
Ser Ala Tyr 



TCC CAG AAC 
Ser Gin Asn 



ACT ACT CTT 
Thr Thr Leu 
780 

TCA GTT GAA 
Ser Val Glu 
795 

AAT CAG AGC 
Asn Gin Ser 
810 

GCT GCA GTG 
Ala Ala Val 



GTT CTA AGA 
Val Leu Arg 



GTA GTA TTC 
Val Val Phe 
860 

CGT GGA GAA 
Arg Gly Glu 
875 

GCA GAA GTT 
Ala Glu Val 
890 



GAC ACA CTC 
Asp Thr Leu 
685 

ATG GAA AAC 
Met Glu Asn 

CGG AAC AGA 
Arg Asn Arg 



AAC ACT GGA 
Asn Thr Gly 
735 

TTG CTG AGT 
Leu Leu Ser 
750 

CCA CCA GTC 
Pro Pro Val 
765 

CAA TCA GAT 
Gin Ser Asp 



ATG AAG AAG 
Met Lys Lys 



CCC CGC AGC 
Pro Arg Ser 
815 

GAG AGG CTC 
Glu Arg Leu 
830 

AAC AGG GCT 
Asn Arg Ala 
845 

CAG GAA TTT 
Gin Glu Phe 



CTA AAT GAA 
Leu Asn Glu 



GAA GAT AAT 
Glu Asp Asn 
895 



ACC 2064 
Thr 



CCA 2112 
Pro 



GGC 2160 

Gly 

720 

GAT 2208 
Asp 

AAA 2256 
Lys 

TTG 2304 
Leu 



CAA 2352 
Gin 



GAA 2400 

Glu 

800 

TTT 2448 
Phe 



TGG 2496 
Trp 



CAG 254 4 

Gin 



ACC 2592 
Thr 



CAT 2 64 0 

His 

880 

ATC 2 68 8 
He 
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ATG GTT ACC TTC AGA AAT CAG GCC TCT CGT CCC TAT TCC TTC TAT TCT 2736 
Met Val Thr Phe Arg Asn Gin Ala Ser Arg Pro Tyr Ser Phe Tyr Ser 
900 905 910 

TCC CTC ATA TCA TAT GAG GAA GAT CAG AGG CAA GGA GCA GAA CCT AGA 27 84 

Ser Leu lie Ser Tyr Glu Glu Asp Gin Arg Gin Gly Ala Glu Pro Arg 
915 920 925 

AAA AAC TTT GTC AAG CCT AAT GAA ACC AAA ACT TAC TTT TGG AAA GTG 2832 
Lys Asn Phe Val Lys Pro Asn Glu Thr Lys Thr Tyr Phe. Trp Lys Val 
930 935 940 

CAA CAT CAT ATG GCA CCC ACT AAA GAT GAG TTT GAC TGC AAA GCC TGG 28 8 0 
Gin His His Met Ala Pro Thr Lys Asp Glu Phe Asp Cys Lys Ala Trp 
945 950 955 960 

GCT TAT TTC TCC GAT GTC GAC CTG GAA AAA GAT GTG CAC TCA GGC CTG 2 928 
Ala Tyr Phe Ser Asp Val Asp Leu Glu Lys Asp Val His Ser Gly Leu 
965 970 975 

ATT GGA CCC CTT CTG GTC TGC CAC ACC AAC ACA CTG AAC CCT GCT CAT 297 6 

lie Gly Pro Leu Leu Val Cys His Thr Asn Thr Leu Asn Pro Ala His 
980 985 990 

GGG AGA CAA GTG ACA GTA CAG GAA TTT GCT CTG TTT TTC ACC ATC TTC 302 4 

Gly Arg Gin Val Thr Val Gin Glu Phe Ala Leu Phe Phe Thr He Phe 
995 1000 1005 

GAT GAG ACC AAA AGC TGG TAC TTC ACT GAA AAT ATG GAA AGA AAC TGC 307 2 

Asp Glu Thr Lys Ser Trp Tyr Phe Thr Glu Asn Met Glu Arg Asn Cys 
1010 1015 1020 

AGG GCT CCC TGC AAT ATC CAG ATG GAA GAT CCC ACT TTT AAA GAG AAT 312 0 

Arg Ala Pro Cys Asn He Gin Met Glu Asp Pro Thr Phe Lys Glu Asn 
1025 1030 1035 1040 

TAT CGC TTC CAT GCA ATC AAT GGC TAC ATA ATG GAT ACA CTA CCT GGC 3168 
Tyr Arg Phe His Ala He Asn Gly Tyr He Met Asp Thr Leu Pro Gly 
1045 1050 1055 

TTA GTA ATG GCT CAG GAT CAA AGG ATT CGA TGG TAT CTG CTC AGC ATG 3216 
Leu Val Met Ala Gin Asp Gin Arg He Arg Trp Tyr Leu Leu Ser Met 
1060 1065 1070 

GGC AGC AAT GAA AAC ATC CAT TCT ATT CAT TTC TCC GGA CAT GTG TTC 32 64 
Gly Ser Asn Glu Asn He His Ser He His Phe Ser Gly His Val Phe 
1075 1080 1085 

ACT GTA CGA AAA AAA GAG GAG TAT AAA ATG GCA CTG TAC AAT CTC TAT 3312 
Thr Val Arg Lys Lys Glu Glu Tyr Lys Met Ala Leu Tyr Asn Leu Tyr 
1090 1095 1100 

CCC GGA GTT TTC GAG ACA GTG GAA ATG TTA CCA TCC AAA GCT GGA ATT 3360 
Pro Gly Val Phe Glu Thr Val Glu Met Leu Pro Ser Lys Ala Gly He 
1105 1110 1115 1120 
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TGG CGG GTG GAA TGC CTT ATT GGC GAG CAT CTA CAT GCT GGG ATG AGC 34 08 

Trp Arg Val Glu Cys Leu lie Gly Glu His Leu His Ala Gly Met Ser 
1125 1130 1135 

ACA CTT TTT CTG GTG TAC TCC AAT AAG TGT CAG ACT CCC CTG GGA ATG 34 56 
Thr Leu Phe Leu Val Tyr Ser Asn Lys Cys Gin Thr Pro Leu Gly Met 
1140 1145 1150 

GCT TCT GGA CAC ATT AGA GAT TTT CAG ATT ACA GCT TCA GGA CAA TAT 3504 
Ala Ser Gly His lie Arg Asp Phe Gin lie Thr Ala Ser Gly Gin Tyr 
1155 1160 1165 

GGA CAG TGG GCC CCA AAG CTG GCC AGA CTT CAT TAT TCC GGA TCA ATC 3552 
Gly Gin Trp Ala Pro Lys Leu Ala Arg Leu His Tyr Ser Gly Ser lie 
1170 1175 1180 

AAT GCC TGG AGC ACC AAG GAG CCC TTT TCT TGG ATC AAA GTT GAC CTG 3 600 

Asn Ala Trp Ser Thr Lys Glu Pro Phe Ser Trp lie Lys Val Asp Leu 
1185 1190 1195 1200 

TTG GCA CCA ATG ATT ATT CAC GGC ATC AAG ACC CAG GGT GCC CGT CAG 3 64 8 
Leu Ala Pro Met lie lie His Gly lie Lys Thr Gin Gly Ala Arg Gin 
1205 1210 1215 

AAG TTC TCC AGC CTC TAC ATC TCT CAA TTT ATC ATC ATG TAT AGT CTC 3696 
Lys Phe Ser Ser Leu Tyr lie Ser Gin Phe lie lie Met Tyr Ser Leu 
1220 1225 1230 

GAT GGG AAG AAG TGG CAG ACT TAT CGA GGA AAT TCC ACT GGA ACC CTC 37 4 4 

Asp Gly Lys Lys Trp Gin Thr Tyr Arg Gly Asn Ser Thr Gly Thr Leu 
1235 1240 1245 

ATG GTC TTC TTT GGC AAT GTG GAT TCA TCT GGG ATA AAA CAC AAT ATT 37 92 

Met Val Phe Phe Gly Asn Val Asp Ser Ser Gly lie Lys His Asn lie 
1250 1255 1260 

TTC AAC CCT CCA ATT ATT GCT CGA TAC ATC CGT TTG CAC CCA ACT CAT 38 4 0 

Phe Asn Pro Pro lie lie Ala Arg Tyr lie Arg Leu His Pro Thr His 
1265 1270 1275 1280 

TAT AGC ATT CGC AGC ACT CTT CGC ATG GAG TTG ATG GGC TGT GAT TTA 38 88 
Tyr Ser lie Arg Ser Thr Leu Arg Met Glu Leu Met Gly Cys Asp Leu 
1285 1290 1295 

AAT AGT TGC AGC ATG CCA TTG GGA ATG GAG AGT AAA GCA ATA TCA GAT 3936 
Asn Ser Cys Ser Met Pro Leu Gly Met Glu Ser Lys Ala lie Ser Asp 
1300 1305 1310 

GCA CAG ATT ACT GCT TCA TCC TAC TTT ACC AAT ATG TTT GCC ACC TGG 398 4 

Ala Gin lie Thr Ala Ser Ser Tyr Phe Thr Asn Met Phe Ala Thr Trp 
1315 1320 1325 

TCT CCT TCA AAA GCT CGA CTA CAC CTA CAA GGG AGG AGT AAT GCC TGG 4 032 

Ser Pro Ser Lys Ala Arg Leu His Leu Gin Gly Arg Ser Asn Ala Trp 
1330 1335 1340 
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AGA CCT CAA GTT AAC AAT CCA AAA GAG TGG CTG CAA GTG GAC TTC CAG 4 080 

Arg Pro Gin Val Asn Asn Pro Lys Glu Trp Leu Gin Val Asp Phe Gin 
1345 1350 1355 1360 

AAG ACA ATG AAA GTC ACA GGA GTA ACT ACT CAG GGA GTA AAA TCT CTG 4128 
Lys Thr Met Lys Val Thr Gly Val Thr Thr Gin Gly Val Lys Ser Leu 
1365 1370 1375 

CTT ACC TCT ATG TAC GTG AAG GAG TTC CTC ATA TCG TCG TCG CAA GAT 417 6 
Leu Thr Ser Met Tyr Val Lys Glu Phe Leu lie Ser Ser Ser Gin Asp 
1380 1385 1390 

GGC CAT CAG TGG ACT CTC TTT TTT CAA AAT GGC AAA GTA AAA GTT TTC 4 224 

Gly His Gin Trp Thr Leu Phe Phe Gin Asn Gly Lys Val Lys Val Phe 
1395 1400 1405 

CAG GGA AAT CAA GAC TCC TTC ACA CCT GTC GTG AAC TCT CTA GAC CCA 4 27 2 

Gin Gly Asn Gin Asp Ser Phe Thr Pro Val Val Asn Ser Leu Asp Pro 
1410 1415 1420 

CCG TTA CTC ACT CGC TAC CTT CGA ATT CAC CCC CAG AGT TGG GTG CAC 4 320 
Pro Leu Leu Thr Arg Tyr Leu Arg lie His Pro Gin Ser Trp Val His 
1425 1430 1435 1440 

CAG ATT GCC CTG AGG ATG GAG GTT CTG GGC TGC GAG GCA CAG GAC CTC 4 3 68 

Gin lie Ala Leu Arg Met Glu Val Leu Gly Cys Glu Ala Gin Asp Leu 
1445 1450 1455 

TAC TGA 4 37 4 

Tyr 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9164 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1006.. 5376 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

GTCGACGGTA TCGATAAGCT TGATATCGAA TTCCTGCAGC CCGGGGGATC CACTAGTACT 60 

CGAGACCTAG GAGTTAATTT TTAAAAAGCA GTCAAAAGTC CAAGTGGCCC TTGCGAGCAT 120 

TTACTCTCTC TGTTTGCTCT GGTTAATAAT CTCAGGAGCA CAAACATTCC TTACTAGTCC 180 

TAGAAGTTAA TTTTTAAAAA GCAGTCAAAA GTCCAAGTGG CCCTTGCGAG CAT T TAC TCT 24 0 

CTCTGTTTGC TCTGGTTAAT AATCTCAGGA GCACAAACAT TCCTTACTAG TTCTAGAGCG 300 

GCCGCCAGTG TGCTGGAATT CGGCTTTTTT AGGGCTGGAA GCTACCTTTG ACATCATTTC 360 
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CTCTGCGAAT GCATGTATAA TTTCTACAGA ACCTATTAGA AAGGATCACC CAGCCTCTGC 42 0 

TTTTGTACAA CTTTCCCTTA AAAAACTGCC AATTCCACTG CTGTTTGGCC CAATAGTGAG 4 80 

AACTTTTTCC TGCTGCCTCT TGGTGCTTTT GCCTATGGCC CCTATTCTGC CTGCTGAAGA 54 0 

CACTCTTGCC AGCATGGACT TAAACCCCTC CAGCTCTGAC AATCCTCTTT CTCTTTTGTT 600 

TTACATGAAG GGTCTGGCAG CCAAAGCAAT CACTCAAAGT TCAAACCTTA TCATTTTTTG 660 

CTTTGTTCCT CTTGGCCTTG GTTTTGTACA TCAGCTTTGA AAATACCATC CCAGGGTTAA 72 0 

TGCTGGGGTT AATTTATAAC TAAGAGTGCT CTAGTTTTGC AATACAGGAC AT GCT ATAAA 7 80 

AATGGAAAGA TGTTGCTTTC TGAGAGATCT CGAGGAAGCT AACAACAAAG AACAACAAAC 840 

AACAATCAGG TAAGTATCCT TTTTACAGCA CAACTTAATG AGACAGATAG AAACTGGTCT 900 

TGTAGAAACA GAGTAGTCGC CTGCTTTTCT GCCAGGTGCT GACTTCTCTC CCCTTCTCTT 960 

TTTTCCTTTT CTCAGGATAA CAAGAACGAA ACAATAACAG CCACC ATG GAA ATA 1014 

Met Glu lie 
1 

GAG CTC TCC ACC TGC TTC TTT CTG TGC CTT TTG CGA TTC TGC TTT AGT 10 62 

Glu Leu Ser Thr Cys Phe Phe Leu Cys Leu Leu Arg Phe Cys Phe Ser 
5 10 15 

GCC ACC AGA AGA TAC TAC CTG GGT GCA GTG GAA CTG TCA TGG GAC TAT 1110 
Ala Thr Arg Arg Tyr Tyr Leu Gly Ala Val Glu Leu Ser Trp Asp Tyr 
20 25 30 35 

ATG CAA AGT GAT CTC GGT GAG CTG CCT GTG GAC GCA AGA TTT CCT CCT 1158 
Met Gin Ser Asp Leu Gly Glu Leu Pro Val Asp Ala Arg Phe Pro Pro 
40 45 50 

AGA GTG CCA AAA TCT TTT CCA TTC AAC ACC TCA GTC GTG TAC AAA AAG 120 6 

Arg Val Pro Lys Ser Phe Pro Phe Asn Thr Ser Val Val Tyr Lys Lys 
55 60 65 

ACT CTG TTT GTA GAA TTC ACG GTT CAC CTT TTC AAC ATC GCT AAG CCA 1254 
Thr Leu Phe Val Glu Phe Thr Val His Leu Phe Asn lie Ala Lys Pro 
70 75 80 

AGG CCA CCC TGG ATG GGT CTG CTA GGT CCT ACC ATC CAG GCT GAG GTT 1302 
Arg Pro Pro Trp Met Gly Leu Leu Gly Pro Thr lie Gin Ala Glu Val 
85 90 95 

TAT GAT ACA GTG GTC ATT ACA CTT AAG AAC ATG GCT TCC CAT CCT GTC 1350 
Tyr Asp Thr Val Val lie Thr Leu Lys Asn Met Ala Ser His Pro Val 
100 105 110 115 

AGT CTT CAT GCT GTT GGT GTA TCC TAC TGG AAA GCT TCT GAG v GGA GCT 1398 
Ser Leu His Ala Val Gly Val Ser Tyr Trp Lys Ala Ser Glu Gly Ala 
120 125 130 
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GAA TAT GAT 
Glu Tyr Asp 



TTC CCT GGT 
Phe Pro Gly 
150 

GGT CCA ATG 
Gly Pro Met 
165 

CAT GTG GAC 
His Val Asp 
180 

CTA GTA TGT 
Leu Val Cys 



CAC AAA TTT 
His Lys Phe 



CAC TCA GAA 
His Ser Glu 
230 

GCT CGG GCC 
Ala Arg Ala 
245 

TCT CTG CCA 
Ser Leu Pro 
260 

GTG ATT GGA 
Val He Gly 



GGT CAC ACA 
Gly His Thr 



TCG CCA ATA 
Ser Pro He 
310 

CAG TTT CTA 
Gin Phe Leu 
325 

GAA GCT TAT 
Glu Ala Tyr 
340 



GAT CAG ACC 
Asp Gin Thr 
135 

GGA AGC CAT 
Gly Ser His 



GCC TCT GAC 
Ala Ser Asp 



CTG GTA AAA 
Leu Val Lys 
185 

AGA GAA GGG 
Arg Glu Gly 
200 

ATA CTA CTT 
He Leu Leu 
215 

ACA AAG AAC 
Thr Lys Asn 



TGG CCT AAA 
Trp Pro Lys 



GGT CTG ATT 
Gly Leu He 
265 

ATG GGC ACC 
Met Gly Thr 
280 

TTT CTT GTG 
Phe Leu Val 
295 

ACT TTC CTT 
Thr Phe Leu 



CTG TTT TGT 
Leu Phe Cys 



GTC AAA GTA 
Val Lys Val 
345 



AGT CAA AGG 
Ser Gin Arg 
140 

ACA TAT GTC 
Thr Tyr Val 

155 

CCA CTG TGC 
Pro Leu Cys 

170 

GAC TTG AAT 
Asp Leu Asn 

AGT CTG GCC 
Ser Leu Ala 



TTT GCT GTA 
Phe Ala Val 
220 

TCC TTG ATG 
Ser Leu Met 
235 

ATG CAC ACA 
Met His Thr 
250 

GGA TGC CAC 
Gly Cys His 



ACT CCT GAA 
Thr Pro Glu 



AGG AAC CAT 
Arg Asn His 
300 

ACT GCT CAA 
Thr Ala Gin 
315 

CAT ATC TCT 

His He Ser 
330 

GAC AGC TGT 

Asp Ser Cys 



GAG AAA GAA 
Glu Lys Glu 



TGG CAG GTC 
Trp Gin Val 



CTT ACC TAC 
Leu Thr Tyr 
175 

TCA GGC CTC 
Ser Gly Leu 
190 

AAG GAA AAG 
Lys Glu Lys 
205 

TTT GAT GAA 
Phe Asp Glu 



CAG GAT AGG 
Gin Asp Arg 



GTC AAT GGT 
Val Asn Gly 
255 

AGG AAA TCA 
Arg Lys Ser 
270 

GTG CAC TCA 
Val His Ser 
285 

CGC CAG GCG 
Arg Gin Ala 



ACA CTC TTG 
Thr Leu Leu 



TCC CAC CAA 
Ser His Gin 
335 

CCA GAG GAA 
Pro Glu Glu 
350 



GAT GAT AAA 
Asp Asp Lys 
145 

CTG AAA GAG 
Leu Lys Glu 
160 

TCA TAT CTT 
Ser Tyr Leu 



ATT GGA GCC 
He Gly Ala 



ACA CAG ACC 
Thr Gin Thr 
210 

GGG AAA AGT 
Gly Lys Ser 

225 

GAT GCT GCA 
Asp Ala Ala 
240 

TAT GTA AAC 
Tyr Val Asn 



GTC TAT TGG 
Val Tyr Trp 



ATA TTC CTC 
He Phe Leu 
290 

TCC TTG GAA 
Ser Leu Glu 
305 

ATG GAC CTT 
Met Asp Leu 
320 

CAT GAT GGC 
His Asp Gly 



CCC CAA CTA 
Pro Gin Leu 



GTC 14 4 6 
Val 



AAT 1494 
Asn 



TCT 1542 
Ser 



CTA 1590 

Leu 

195 

TTG 1638 
Leu 



TGG 1686 
Trp 



TCT 1734 
Ser 



AGG 17 8 2 
Arg 



CAT 18 30 

His 

275 

GAA 1878 
Glu 



ATC 1926 
He 



GGA 1974 
Gly 



ATG 2022 
Met 



CGA 2070 

Arg 

355 
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ATG AAA AAT 
Met Lys Asn 



TCT GAA ATG 
Ser Glu Met 



ATC CAA ATT 

lie Gin He 
390 

TAC ATT GCT 
Tyr He Ala 
405 

GCC CCC GAT 
Ala Pro Asp 
420 

CAG CGG ATT 
Gin Arg He 



GAT GAA ACC 
Asp Glu Thr 



TTG GGA CCT 
Leu Gly Pro 
470 

TTT AAG AAT 
Phe Lys Asn 
485 

ACT GAT GTC 
Thr Asp Val 
500 

CAT TTG AAG 
His Leu Lys 



TGG ACA GTG 
Trp Thr Val 



CTG ACC CGC 
Leu Thr Arg 
550 

TCA GGA CTC 
Ser Gly Leu 
565 



AAT GAA GAA 
Asn Glu Glu 
360 

GAT GTG GTC 
Asp Val Val 

375 

CGC TCA GTT 
Arg Ser Val 



GCT GAA GAG 
Ala Glu Glu 



GAC AGA AGT 
Asp Arg Ser 
425 

GGT AGG AAG 
Gly Arg Lys 
440 

TTT AAG ACT 
Phe Lys Thr 
455 

TTA CTT TAT 
Leu Leu Tyr 

CAA GCA AGC 
Gin Ala Ser 



CGT CCT TTG 
Arg Pro Leu 
505 

GAT TTT CCA 
Asp Phe Pro 
520 

ACT GTA GAA 
Thr Val Glu 
535 

TAT TAC TCT 
Tyr Tyr Ser 



ATT GGC CCT 
lie Gly Pro 



GCG GAA GAC 
Ala Glu Asp 



AGG TTT GAT 
Arg Phe Asp 

380 

GCC AAG AAG 
Ala Lys Lys 
395 

GAG GAC TGG 
Glu Asp Trp 
410 

TAT AAA AGT 
Tyr Lys Ser 



TAC AAA AAA 
Tyr Lys Lys 



CGT GAA GCT 
Arg Glu Ala 
460 

GGG GAA GTT 
Gly Glu Val 
475 

AGA CCA TAT 
Arg Pro Tyr 
490 

TAT TCA AGG 
Tyr Ser Arg 

ATT CTG CCA 
He Leu Pro 



GAT GGG CCA 
Asp Gly Pro 
540 

AGT TTC GTT 
Ser Phe Val 
555 

CTC CTC ATC 
Leu Leu He 
570 



TAT GAT GAT 
Tyr Asp Asp 
365 

GAT GAC AAC 
Asp Asp Asn 



CAT CCT AAA 
His Pro Lys 



GAC TAT GCT 
Asp Tyr Ala 
415 

CAA TAT TTG 
Gin Tyr Leu 
430 

GTC CGA TTT 
Val Arg Phe 
445 

ATT CAG CAT 
He Gin His 



GGA GAC ACA 
Gly Asp Thr 



AAC ATC TAC 
Asn He Tyr 
495 

AGA TTA CCA 
Arg Leu Pro 
510 

GGA GAA ATA 
Gly Glu He 
525 

ACT AAA TCA 
Thr Lys Ser 



AAT ATG GAG 
Asn Met Glu 



TGC TAC AAA 
Cys Tyr Lys 
575 



GAT CTT ACT 
Asp Leu Thr 

370 

TCT CCT TCC 
Ser Pro Ser 
385 

ACT TGG GTA 
Thr Trp Val 
400 

CCC TTA GTC 
Pro Leu Val 



AAC AAT GGC 
Asn Asn Gly 



ATG GCA TAC 
Met Ala Tyr 
450 

GAA TCA GGA 
Glu Ser Gly 
465 

CTG TTG ATT 
Leu Leu He 
480 

CCT CAC GGA 
Pro His Gly 

AAA GGT GTA 
Lys Gly Val 

TTC AAA TAT 
Phe Lys Tyr 
530 

GAT CCT CGG 
Asp Pro Arg 
545 

AGA GAT CTA 
Arg Asp Leu 
560 

GAA TCT GTA 
Glu Ser Val 



GAT 2118 
Asp 



TTT 2166 
Phe 



CAT 2214 
His 



CTC 2262 
Leu 



CCT 2310 

Pro 

435 

ACA 2358 
Thr 



ATC 2406 
He 



ATA 2454 
He 



ATC 2502 
He 

AAA 2550 
Lys 

515 

AAA 2598 
Lys 

TGC 2646 
Cys 



GCT 2694 
Ala 



GAT 27 4 2 

Asp 
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CAA AGA GGA 
Gin Arg Gly 
580 

TCT GTA TTT 
Ser Val Phe 



CGC TTT CTC 
Arg Phe Leu 



CAA GCC TCC 
Gin Ala Ser 
630 

TTG CAG TTG 
Leu Gin Leu 
645 

AGC ATT GGA 
Ser lie Gly 
660 

ACC TTC AAA 
Thr Phe Lys 



TTC TCA GGA 
Phe Ser Gly 



ATT CTG GGG 
lie Leu Gly 
710 

TTA CTG AAG 
Leu Leu Lys 
725 

GAC AGT TAT 
Asp Ser Tyr 

740 

ATT GAA CCA 
lie Glu Pro 



CAA CGG GAA 
Gin Arg Glu 



GAC TAT GAT 
Asp Tyr Asp 
790 



AAC CAG ATA 
Asn Gin lie 
585 

GAT GAG AAC 
Asp Glu Asn 
600 

CCC AAT CCA 
Pro Asn Pro 
615 

AAC ATC ATG 
Asn lie Met 



TCA GTT TGT 
Ser Val Cys 



GCA CAG ACT 
Ala Gin Thr 
665 

CAC AAA ATG 
His Lys Met 
680 

GAA ACT GTC 
Glu Thr Val 
695 

TGC CAC AAC 
Cys His Asn 



GTT TCT AGT 
Val Ser Ser 



GAA GAT ATT 
Glu Asp lie 
745 

AGA AGC TTC 
Arg Ser Phe 
760 

ATA ACT CGT 
lie Thr Arg 
775 

GAT ACC ATA 
Asp Thr lie 



ATG TCA GAC 
Met Ser Asp 



CGA AGC TGG 
Arg Ser Trp 



GCT GGA GTG 
Ala Gly Val 
620 

CAC AGC ATC 
His Ser lie 
635 

TTG CAT GAG 
Leu His Glu 
650 

GAC TTC CTT 
Asp Phe Leu 



GTC TAT GAA 
Val Tyr Glu 



TTC ATG TCG 
Phe Met Ser 
700 

TCA GAC TTT 
Ser Asp Phe 
715 

TGT GAC AAG 
Cys Asp Lys 
730 

TCA GCA TAC 
Ser Ala Tyr 



TCC CAG AAC 
Ser Gin Asn 



ACT ACT CTT 
Thr Thr Leu 
780 

TCA GTT GAA 
Ser Val Glu 
795 



AAG AGG AAT 
Lys Arg Asn 
590 

TAC CTC ACA 
Tyr Leu Thr 
605 

CAG CTT GAG 
Gin Leu Glu 



AAT GGC TAT 
Asn Gly Tyr 



GTG GCA TAC 
Val Ala Tyr 
655 

TCT GTC TTC 
Ser Val Phe 
670 

GAC ACA CTC 
Asp Thr Leu 
685 

ATG GAA AAC 

Met Glu Asn 



CGG AAC AGA 
Arg Asn Arg 



AAC ACT GGT 
Asn Thr Gly 
735 

TTG CTG AGT 
Leu Leu Ser 
750 

CCA CCA GTC 
Pro Pro Val 
765 

CAG TCA GAT 
Gin Ser Asp 



ATG AAG AAG 
Met Lys Lys 



GTC ATC CTG 
Val He Leu 



GAG AAT ATA 
Glu Asn He 
610 

GAT CCA GAG 
Asp Pro Glu 
625 

GTT TTT GAT 
Val Phe Asp 
640 

TGG TAC ATT 
Trp Tyr He 



TTC TCT GGA 
Phe Ser Gly 



ACC CTA TTC 
Thr Leu Phe 
690 

CCA GGT CTA 
Pro Gly Leu 
705 

GGC ATG ACC 
Gly Met Thr 
720 

GAT TAT TAC 
Asp Tyr Tyr 

AAA AAC AAT 
Lys Asn Asn 

TTG AAA CGC 
Leu Lys Arg 
770 

CAA GAG GAA 
Gin Glu Glu 
785 

GAA GAT TTT 
Glu Asp Phe 
800 



TTT 2790 

Phe 

595 

CAA 2838 
Gin 



TTC 2886 
Phe 



AGT 2 934 

Ser 



CTA 2982 
Leu 



TAT 3030 

Tyr 

675 

CCA 3078 
Pro 



TGG 3126 
Trp 



GCC 3174 
Ala 



GAG 3222 
Glu 



GCC 3270 

Ala 

755 

CAT 3318 
His 



ATT 3366 
He 



GAC 3414 
Asp 
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ATT TAT GAT 
lie Tyr Asp 
805 

ACA CGA CAC 
Thr Arg His 

820 

ATG AGT AGC 
Met Ser Ser 



GTC CCT CAG 
Val Pro Gin 



TTT ACT CAG 
Phe Thr Gin 
870 

CTG GGG CCA 
Leu Gly Pro 
885 

TTC AGA AAT 
Phe Arg Asn 
900 

TCT TAT GAG 
Ser Tyr Glu 



GTC AAG CCT 
Val Lys Pro 



ATG GCA CCC 
Met Ala Pro 
950 

TCT GAT GTT 
Ser Asp Val 
965 

CTT CTG GTC 
Leu Leu Val 
980 

GTG ACA GTA 
Val Thr Val 



AAA AGC TGG 
Lys Ser Trp 



GAG GAT GAA 
Glu Asp Glu 



TAT TTT ATT 
Tyr Phe lie 
825 

TCC CCA CAT 
Ser Pro His 
840 

TTC AAG AAA 
Phe Lys Lys 
855 

CCC TTA TAC 
Pro Leu Tyr 

TAT ATA AGA 

Tyr lie Arg 



CAG GCC TCT 
Gin Ala Ser 
905 

GAA GAT CAG 
Glu Asp Gin 
920 

AAT GAA ACC 
Asn Glu Thr 
935 

ACT AAA GAT 
Thr Lys Asp 



GAC CTG GAA 
Asp Leu Glu 



TGC CAC ACT 
Cys His Thr 
985 

CAG GAA TTT 
Gin Glu Phe 
1000 

TAC TTC ACT 
Tyr Phe Thr 
1015 



AAT CAG AGC 
Asn Gin Ser 
810 

GCT GCA GTG 
Ala Ala Val 



GTT CTA AGA 
Val Leu Arg 



GTT GTT TTC 
Val Val Phe 
860 

CGT GGA GAA 
Arg Gly Glu 
875 

GCA GAA GTT 
Ala Glu Val 
890 

CGT CCC TAT 
Arg Pro Tyr 



AGG CAA GGA 
Arg Gin Gly 



AAA ACT TAC 
Lys Thr Tyr 
940 

GAG TTT GAC 
Glu Phe Asp 
955 

AAA GAT GTG 
Lys Asp Val 
970 

AAC ACA CTG 
Asn Thr Leu 



GCT CTG TTT 
Ala Leu Phe 



GAA AAT ATG 
Glu Asn Met 

102' 



CCC CGC AGC 
Pro Arg Ser 
815 

GAG AGG CTC 
Glu Arg Leu 
830 

AAC AGG GCT 
Asn Arg Ala 
845 

CAG GAA TTT 
Gin Glu Phe 



CTA AAT GAA 
Leu Asn Glu 



GAA GAT AAT 

Glu Asp Asn 
895 

TCC TTC TAT 
Ser Phe Tyr 
910 

GCA GAA CCT 
Ala Glu Pro 
925 

TTT TGG AAA 
Phe Trp Lys 



TGC AAA GCC 
Cys Lys Ala 



CAC TCA GGC 
His Ser Gly 
975 

AAC CCT GCT 
Asn Pro Ala 
990 

TTC ACC ATC 
Phe Thr He 
1005 

GAA AGA AAC 
Glu Arg Asn 



TTT CAA AAG 
Phe Gin Lys 



TGG GAT TAT 
Trp Asp Tyr 

CAG AGT GGC 
Gin Ser Gly 
850 

ACT GAT GGC 
Thr Asp Gly 
865 

CAT TTG GGA 
His Leu Gly 
880 

ATC ATG GTA 
He Met Val 



TCT AGC CTT 
Ser Ser Leu 



AGA AAA AAC 
Arg Lys Asn 
930 

GTG CAA CAT 
Val Gin His 
945 

TGG GCT TAT 
Trp Ala Tyr 
960 

CTG ATT GGA 
Leu He Gly 



CAT GGG AGA 
His Gly Arg 



TTT GAT GAG 
Phe Asp Glu 
101 

TGC AGG GCT 
Cys Arg Ala 
1025 



AAA 3462 
Lys 



GGG 3510 

Gly 

835 

AGT 3558 
Ser 



TCC 3606 
Ser 



CTC 3654 
Leu 



ACT 3702 
Thr 



ATT 3750 

He 

915 

TTT 3798 
Phe 



CAT 38 4 6 
His 



TTC 3894 
Phe 



CCC 3942 
Pro 



CAA 3990 

Gin 

995 

ACC 4 038 
Thr 

0 

CCC 4086 
Pro 
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TGC AAT ATC CAG ATG GAA GAT CCC ACT TTT AAA GAG AAT TAT CGC TTC 4134 
Cys Asn lie Gin Met Glu Asp Pro Thr Phe Lys Glu Asn Tyr Arg Phe 
1030 1035 1040 

CAT GCA ATC AAT GGC TAC ATA ATG GAT ACA CTA CCT GGC TTA GTA ATG 4182 
His Ala He Asn Gly Tyr He Met Asp Thr Leu Pro Gly Leu Val Met 
1045 1050 1055 

GCT CAG GAT CAA AGG ATT CGA TGG TAT CTG CTC AGC ATG GGC AGC AAT 4 230 

Ala Gin Asp Gin Arg He Arg Trp Tyr Leu Leu Ser Met Gly Ser Asn 
1060 1065 1070 " 1075 

GAA AAC ATC CAT TCT ATT CAT TTC AGT GGA CAT GTG TTC ACT GTA CGA 4 27 8 
Glu Asn He His Ser He His Phe Ser Gly His Val Phe Thr Val Arg 
1080 1085 1090 

AAA AAA GAG GAG TAT AAA ATG GCA CTG TAC AAT CTC TAT CCA GGT GTT 4 32 6 

Lys Lys Glu Glu Tyr Lys Met Ala Leu Tyr Asn Leu Tyr Pro Gly Val 
1095 1100 1105 

TTT GAG ACA GTG GAA ATG TTA CCA TCC AAA GCT GGA ATT TGG CGG GTG 4 37 4 

Phe Glu Thr Val Glu Met Leu Pro Ser Lys Ala Gly He Trp Arg Val 
1110 1115 1120 

GAA TGC CTT ATT GGC GAG CAT CTA CAT GCT GGG ATG AGC ACA CTT TTT 4 4 22 

Glu Cys Leu lie Gly Glu His Leu His Ala Gly Met Ser Thr Leu Phe 
1125 1130 1135 

CTG GTG TAC AGC AAT AAG TGT CAG ACT CCC CTG GGA ATG GCT TCT GGA 4 47 0 

Leu Val Tyr Ser Asn Lys Cys Gin Thr Pro Leu Gly Met Ala Ser Gly 
1140 1145 1150 1155 

CAC ATT AGA GAT TTT CAG ATT ACA GCT TCA GGA CAA TAT GGA CAG TGG 4518 
His He Arg Asp Phe Gin He Thr Ala Ser Gly Gin Tyr Gly Gin Trp 
1160 1165 1170 

GCC CCA AAG CTG GCC AGA CTT CAT TAT TCC GGA TCA ATC AAT GCC TGG 4 56 6 

Ala Pro Lys Leu Ala Arg Leu His Tyr Ser Gly Ser He Asn Ala Trp 
1175 1180 1185 

AGC ACC AAG GAG CCC TTT TCT TGG ATC AAG GTG GAT CTG TTG GCA CCA 4 614 
Ser Thr Lys Glu Pro Phe Ser Trp lie Lys Val Asp Leu Leu Ala Pro 
1190 1195 1200 

ATG ATT ATT CAC GGC ATC AAG ACC CAG GGT GCC CGT CAG AAG TTC TCC 4 662 

Met He He His Gly He Lys Thr Gin Gly Ala Arg Gin Lys Phe Ser 
1205 1210 1215 

AGC CTC TAC ATC TCT CAG TTT ATC ATC ATG TAT AGT CTT GAT GGG AAG 4 710 

Ser Leu Tyr He Ser Gin Phe He He Met Tyr Ser Leu Asp Gly Lys 
1220 1225 1230 1235 

AAG TGG CAG ACT TAT CGA GGA AAT TCC ACT GGA ACC TTA ATG GTC TTC 4 7 58 

Lys Trp Gin Thr Tyr Arg Gly Asn Ser Thr Gly Thr Leu Met Val Phe 
1240 1245 1250 
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TTT GGC AAT GTG GAT TCA TCT GGG ATA AAA CAC AAT ATT TTT AAC OCT 4 806 
Phe Gly Asn Val Asp Ser Ser Gly lie Lys His Asn lie Phe Asn Pro 
1255 1260 1265 

CCA ATT ATT GCT CGA TAC ATC CGT TTG CAC CCA ACT CAT TAT AGC ATT 4 854 
Pro lie He Ala Arg Tyr He Arg Leu His Pro Thr His Tyr Ser He 
1270 1275 1280 

CGC AGC ACT CTT CGC ATG GAG TTG ATG GGC TGT GAT TTA AAT AGT TGC 4 902 
Arg Ser Thr Leu Arg Met Glu Leu Met Gly Cys Asp Leu Asn Ser Cys 
1285 1290 1295 

AGC ATG CCA TTG GGA ATG GAG AGT AAA GCA ATA TCA GAT GCA CAG ATT 4 950 
Ser Met Pro Leu Gly Met Glu Ser Lys Ala He Ser Asp Ala Gin He 
1300 1305 1310 1315 

ACT GCT TCA TCC TAC TTT ACC AAT ATG TTT GCC ACC TGG TCT CCT TCA 4 998 
Thr Ala Ser Ser Tyr Phe Thr Asn Met Phe Ala Thr Trp Ser Pro Ser 
1320 1325 1330 

AAA GCT CGA CTT CAC CTC CAA GGG AGG AGT AAT GCC TGG AGA CCT CAG 504 6 

Lys Ala Arg Leu His Leu Gin Gly Arg Ser Asn Ala Trp Arg Pro Gin 
1335 1340 1345 

GTG AAT AAT CCA AAA GAG TGG CTG CAA GTG GAC TTC CAG AAG ACA ATG 50 94 

Val Asn Asn Pro Lys Glu Trp Leu Gin Val Asp Phe Gin Lys Thr Met 
1350 1355 1360 

AAA GTC ACA GGA GTA ACT ACT CAG GGA GTA AAA TCT CTG CTT ACC AGC 514 2 

Lys Val Thr Gly Val Thr Thr Gin Gly Val Lys Ser Leu Leu Thr Ser 
1365 1370 1375 

ATG TAT GTG AAG GAG TTC CTC ATC TCC AGC AGT CAA GAT GGC CAT CAG 5190 
Met Tyr Val Lys Glu Phe Leu He Ser Ser Ser Gin Asp Gly His Gin 
1380 1385 1390 1395 

TGG ACT CTC TTT TTT CAG AAT GGC AAA GTA AAG GTT TTT CAG GGA AAT 5238 
Trp Thr Leu Phe Phe Gin Asn Gly Lys Val Lys Val Phe Gin Gly Asn 
1400 1405 1410 

CAA GAC TCC TTC ACA CCT GTG GTG AAC TCT CTA GAC CCA CCG TTA CTG 528 6 
Gin Asp Ser Phe Thr Pro Val Val Asn Ser Leu Asp Pro Pro Leu Leu 
1415 1420 1425 

ACT CGC TAC CTT CGA ATT CAC CCC CAG AGT TGG GTG CAC CAG ATT GCC 5334 
Thr Arg Tyr Leu Arg He His Pro Gin Ser Trp Val His Gin He Ala 
1430 1435 1440 

CTG AGG ATG GAG GTT CTG GGC TGC GAG GCA CAG GAC CTC TAC 537 6 

Leu Arg Met Glu Val Leu Gly Cys Glu Ala Gin Asp Leu Tyr 
1445 1450 1455 

TGAGGGTGGC CACTGCAGCA CCTGCCACTG CCGTCACCTC TCCCTCCTCA GCTCCAGGGC 54 36 

AGTGTCCCTC CCTGGCTTGC CTTCTACCTT TGTGCTAAAT CCTAGCAGAC ACTGCCTTGA 54 9 6 

AGCCTCCTGA ATTAACTATC ATCAGTCCTG CATTTCTTTG GTGGGGGGCC AGGAGGGTGC 555 6 
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ATCCAATTTA ACTTAACTCT TACCTATTTT CTGCAGCTGC TCCCAGATTA CTCCTTCCTT 5616 

CCAATATAAC TAGGCAAAAA GAAGTGAGGA GAAACCTGCA TGAAAGCATT CTTCCCTGAA 567 6 

AAGTTAGGCC TCTCAGAGTC ACCACTTCCT CTGTTGTAGA AAAACTATGT GATGAAACTT 57 36 

TGAAAAAGAT ATT TAT GAT G TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG 57 96 

CAATAGCATC ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT 585 6 

GTCCAAACTC ATCAATGTAT CTTATCATGT CTGGATCCCC GGGTGGCATC CCTGTGACCC 5 916 

CTCCCCAGTG CCTCTCCTGG CCCTGGAAGT TGCCACTCCA GTGCCCACCA GCCTTGTCCT 5 97 6 

AATAAAATTA AGTTGCATCA TTTTGTCTGA CTAGGTGTCC TTCTATAATA TTATGGGGTG 603 6 

GAGGGGGGTG GTATGGAGCA AGGGGCAAGT TGGGAAGACA ACCTGTAGGG CCTGCGGGGT 6096 

CTATTCGGGA ACCAAGCTGG AGTGCAGTGG CACAATCTTG GCTCACTGCA ATCTCCGCCT 615 6 

CCTGGGTTCA AGCGATTCTC CTGCCTCAGC CTCCCGAGTT GTTGGGATTC CAGGCATGCA 6216 

TGACCAGGCT CAGCTAATTT TTGTTTTTTT GGTAGAGACG GGGTTTCACC ATATTGGCCA 627 6 

GGCTGGTCTC CAACTCCTAA TCTCAGGTGA TCTACCCACC TTGGCCTCCC AAATTGCTGG 633 6 

GATTACAGGC GTGAACCACT GCTCCCTTCC CTGTCCTTCT GATTTTAAAA TAACTATACC 639 6 

AGCAGGAGGA C G T C C AG AC A CAGCATAGGC TACCTGCCAT GCCCAACCGG TGGGACATTT 64 5 6 

GAGTTGCTTG CTTGGCACTG TCCTCTCATG CGTTGGGTCC ACTCAGTAGA TGCCTGTTGA 6516 

ATTCGTAATC ATGGTCATAG CTGTTTCCTG TGTGAAATTG TTATCCGCTC ACAATTCCAC 657 6 

ACAACATACG AGCCGGAAGC ATAAAGTGTA AAGCCTGGGG TGCCTAATGA GTGAGCTAAC 6636 

TCACATTAAT TGCGTTGCGC TCACTGCCCG CTTTCCAGTC GGGAAACCTG TCGTGCCAGC 6696 

TGCATTAATG AATCGGCCAA CGCGCGGGGA GAGGCGGTTT GCGTATTGGG CGCTCTTCCG 675 6 

CTTCCTCGCT CACTGACTCG CTGCGCTCGG TCGTTCGGCT GCGGCGAGCG GTATCAGCTC 6816 

ACTCAAAGGC GGTAATACGG TTATCCACAG AATCAGGGGA TAACGCAGGA AAGAACATGT 687 6 

GAGCAAAAGG CCAGCAAAAG GCCAGGAACC GTAAAAAGGC CGCGTTGCTG GCGTTTTTCC 6936 

ATAGGCTCCG CCCCCCTGAC GAGCATCACA AAAATCGACG CTCAAGTCAG AGGTGGCGAA 6996 

ACCCGACAGG ACTATAAAGA TACCAGGCGT TTCCCCCTGG AAGCTCCCTC GTGCGCTCTC 7 056 

CTGTTCCGAC CCTGCCGCTT ACCGGATACC TGTCCGCCTT TCTCCCTTCG GGAAGCGTGG 7116 

CGCTTTCTCA TAGCTCACGC TGTAGGTATC TCAGTTCGGT GTAGGTCGTT CGCTCCAAGC 717 6 

TGGGCTGTGT GCACGAACCC CCCGTTCAGC CCGACCGCTG CGCCTTATCC GGTAACTATC 7236 

GTCTTGAGTC CAACCCGGTA AGACACGACT TATCGCCACT GGCAGCAGCC ACTGGTAACA 7296 
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GGATTAGCAG AGCGAGGTAT GTAGGCGGTG CTACAGAGTT CTTGAAGTGG TGGCCTAACT 7356 

ACGGCTACAC TAGAAGGACA GTATTTGGTA TCTGCGCTCT GCTGAAGCCA GTTACCTTCG 7 416 

GAAAAAGAGT TGGTAGCTCT TGATCCGGCA AACAAACCAC CGCTGGTAGC GGTGGTTTTT 74 76 

TTGTTTGCAA GCAGCAGATT ACGCGCAGAA AAAAAGG AT C TCAAGAAGAT CCTTTGATCT 7 536 

TTTCTACGGG GTCTGACGCT CAGTGGAACG AAAACTCACG TTAAGGGATT TTGGTCATGA 7596 

GAT TAT C AAA AAGGATCTTC AC C TAG AT C C TTTTAAATTA AAAATGAAGT TTTAAATCAA 7 65 6 

TCTAAAGTAT ATATGAGTAA ACTTGGTCTG ACAGTTACCA ATGCTTAATC AGTGAGGCAC 7716 

CTATCTCAGC GATCTGTCTA TTTCGTTCAT CCATAGTTGC CTGACTCCCC GTCGTGTAGA 777 6 

TAACTACGAT ACGGGAGGGC TTACCATCTG GCCCCAGTGC TGCAATGATA CCGCGAGACC 7 83 6 

CACGCTCACC GGCTCCAGAT TTATCAGCAA TAAACCAGCC AGCCGGAAGG GCCGAGCGCA 7 8 96 

GAAGTGGTCC TGCAACTTTA TCCGCCTCCA TCCAGTCTAT TAATTGTTGC CGGGAAGCTA 7 95 6 

GAGTAAGTAG TTCGCCAGTT AATAGTTTGC GCAACGTTGT TGCCATTGCT ACAGGCATCG 8016 

TGGTGTCACG CTCGTCGTTT GGTATGGCTT CATTCAGCTC CGGTTCCCAA CGATCAAGGC 8 07 6 

GAGTTACATG ATCCCCCATG TTGTGCAAAA AAGCGGTTAG CTCCTTCGGT CCTCCGATCG 813 6 

TTGTCAGAAG TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA CTGCATAATT 8196 

CTCTTACTGT CATGCCATCC GTAAGATGCT TTTCTGTGAC TGGTGAGTAC TCAACCAAGT 82 56 

CATTCTGAGA ATAGTGTATG CGGCGACCGA GTTGCTCTTG CCCGGCGTCA ATACGGGATA 8316 

ATACCGCGCC ACATAGCAGA ACT TT AAAAG TGCTCATCAT TGGAAAACGT TCTTCGGGGC 837 6 

GAAAACTCTC AAGGATCTTA CCGCTGTTGA GATCCAGTTC GATGTAACCC ACTCGTGCAC 8 43 6 

CCAACTGATC TTCAGCATCT TTTACTTTCA CCAGCGTTTC TGGGTGAGCA AAAACAGGAA 84 96 

GGCAAAATGC CGCAAAAAAG GGAATAAGGG CGACACGGAA ATGTTGAATA CTCATACTCT 8 55 6 

TCCTTTTTCA ATATTATTGA AGCATTTATC AGGGTTATTG TCTCATGAGC GGATACATAT 8 616 

TTGAATGTAT TTAGAAAAAT AAACAAATAG GGGTTCCGCG CACATTTCCC CGAAAAGTGC 8 67 6 

CACCTGACGT CTAAGAAACC AT TAT T AT C A TGACATTAAC CTATAAAAAT AGGCGTATCA 87 36 

CGAGGCCCTT TCGTCTCGCG CGTTTCGGTG ATGACGGTGA AAACCTCTGA CACATGCAGC 8 7 96 

TCCCGGAGAC GGTCACAGCT TGTCTGTAAG CGGATGCCGG GAGCAGACAA GCCCGTCAGG 8 856 

GCGCGTCAGC GGGTGTTGGC GGGTGTCGGG GCTGGCTTAA CTATGCGGCA TCAGAGCAGA 8 916 

TTGTACTGAG AGTGCACCAT ATGCGGTGTG AAATACCGCA CAGATGCGTA AGGAGAAAAT 8 97 6 

ACCGCAT CAG GCGCCATTCG CCATTCAGGC TGCGCAACTG TTGGGAAGGG CGATCGGTGC 9036 
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GGGCCTCTTC GCTATTACGC CAGCTGGCGA AAGGGGGATG TGCTGCAAGG CGATTAAGTT 



GGGTAACGCC AGGGTTTTCC CAGTCACGAC GTTGTAAAAC GACGGCCAGT GCCAAGCTTG 



GGCTGCAG 



9096 
9156 
9164 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12022 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1006.. 3294 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 6153.. 8234 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

GTCGACGGTA TCGATAAGCT TGATATCGAA TTCCTGCAGC CCGGGGGATC CACTAGTACT 60 

CGAGACCTAG GAGTTAATTT TTAAAAAGCA GTCAAAAGTC CAAGTGGCCC TTGCGAGCAT 12 0 

TTACTCTCTC TGTTTGCTCT GGTTAATAAT C T C AG GAG C A CAAACATTCC TTACTAGTCC 18 0 

TAGAAGTTAA TTTTTAAAAA GCAGTCAAAA GTCCAAGTGG CCCTTGCGAG CATTTACTCT 24 0 

CTCTGTTTGC TCTGGTTAAT AATCTCAGGA GCACAAACAT TCCTTACTAG TTCTAGAGCG 300 

GCCGCCAGTG TGCTGGAATT CGGCTTTTTT AGGGCTGGAA GCTACCTTTG ACATCATTTC 360 

CTCTGCGAAT GCATGTATAA TTTCTACAGA ACCTATTAGA AAGGATCACC CAGCCTCTGC 420 

TTTTGTACAA CTTTCCCTTA AAAAACTGCC AATTCCACTG CTGTTTGGCC CAATAGTGAG 4 80 

AACTTTTTCC TGCTGCCTCT TGGTGCTTTT GCCTATGGCC CCTATTCTGC CTGCTGAAGA 54 0 

CACTCTTGCC AGCATGGACT TAAACCCCTC CAGCTCTGAC AATCCTCTTT CTCTTTTGTT 600 

TTACATGAAG GGTCTGGCAG CCAAAGCAAT CACTCAAAGT TCAAACCTTA TCATTTTTTG 660 

CTTTGTTCCT CTTGGCCTTG GTTTTGTACA TCAGCTTTGA AAATACCATC CCAGGGTTAA 720 

TGCTGGGGTT AATTTATAAC TAAGAGTGCT CTAGTTTTGC AATACAGGAC ATGCTATAAA 780 

AATGGAAAGA TGTTGCTTTC T GAG AG AT C T CGAGGAAGCT AACAACAAAG AACAACAAAC 840 

AACAATCAGG TAAGTATCCT TTTTACAGCA CAACTTAATG AGACAGATAG AAACTGGTCT 900 

TGTAGAAACA GAGTAGTCGC CTGCTTTTCT GCCAGGTGCT GACTTCTCTC CCCTTCTCTT 960 
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TTTTCCTTTT CTCAGGATAA CAAGAACGAA ACAATAACAG CCACC ATG GAA ATA 1014 

Met Glu He 
1 

GAG CTC TCC ACC TGC TTC TTT CTG TGC CTT TTG CGA TTC TGC TTT AGT 1062 
Glu Leu Ser Thr Cys Phe Phe Leu Cys Leu Leu Arg Phe Cys Phe Ser 
5 10 15 

GCC ACC AGA AGA TAC TAC CTG GGT GCA GTG GAA CTG TCA TGG GAC TAT 1110 
Ala Thr Arg Arg Tyr Tyr Leu Gly Ala Val Glu Leu Ser Trp Asp Tyr 

20 25 30 35 

ATG CAA AGT GAT CTC GGT GAG CTG CCT GTG GAC GCA AGA TTT CCT CCT 1158 
Met Gin Ser Asp Leu Gly Glu Leu Pro Val Asp Ala Arg Phe Pro Pro 
40 45 50 

AGA GTG CCA AAA TCT TTT CCA TTC AAC ACC TCA GTC GTG TAC AAA AAG 12 0 6 

Arg Val Pro Lys Ser Phe Pro Phe Asn Thr Ser Val Val Tyr Lys Lys 
55 60 65 

ACT CTG TTT GTA GAA TTC ACG GTT CAC CTT TTC AAC ATC GCT AAG CCA 1254 
Thr Leu Phe Val Glu Phe Thr Val His Leu Phe Asn He Ala Lys Pro 
70 75 80 

AGG CCA CCC TGG ATG GGT CTG CTA GGT CCT ACC ATC CAG GCT GAG GTT 1302 
Arg Pro Pro Trp Met Gly Leu Leu Gly Pro Thr He Gin Ala Glu Val 
85 90 95 

TAT GAT ACA GTG GTC ATT ACA CTT AAG AAC ATG GCT TCC CAT CCT GTC 1350 
Tyr Asp Thr Val Val He Thr Leu Lys Asn Met Ala Ser His Pro Val 
100 105 110 115 

AGT CTT CAT GCT GTT GGT GTA TCC TAC TGG AAA GCT TCT GAG GGA GCT 1398 
Ser Leu His Ala Val Gly Val Ser Tyr Trp Lys Ala Ser Glu Gly Ala 
120 125 130 

GAA TAT GAT GAT CAG ACC AGT CAA AGG GAG AAA GAA GAT GAT AAA GTC 14 4 6 

Glu Tyr Asp Asp Gin Thr Ser Gin Arg Glu Lys Glu Asp Asp Lys Val 
135 140 145 

TTC CCT GGT GGA AGC CAT ACA TAT GTC TGG CAG GTC CTG AAA GAG AAT 14 94 

Phe Pro Gly Gly Ser His Thr Tyr Val Trp Gin Val Leu Lys Glu Asn 
150 155 160 

GGT CCA ATG GCC TCT GAC CCA CTG TGC CTT ACC TAC TCA TAT CTT TCT 154 2 

Gly Pro Met Ala Ser Asp Pro Leu Cys Leu Thr Tyr Ser Tyr Leu Ser 
165 170 175 

CAT GTG GAC CTG GTA AAA GAC TTG AAT TCA GGC CTC ATT GGA GCC CTA 1590 
His Val Asp Leu Val Lys Asp Leu Asn Ser Gly Leu He Gly Ala Leu 
180 185 190 195 

CTA GTA TGT AGA GAA GGG AGT CTG GCC AAG GAA AAG ACA CAG ACC TTG 1638 
Leu Val Cys Arg Glu Gly Ser Leu Ala Lys Glu Lys Thr Gin Thr Leu 
200 205 210 
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CAC AAA TTT ATA CTA CTT TTT GCT GTA TTT GAT GAA GGG AAA AGT TGG 168 6 

His Lys Phe lie Leu Leu Phe Ala Val Phe Asp Glu Gly Lys Ser Trp 
215 220 225 

CAC TCA GAA ACA AAG AAC TCC TTG ATG CAG GAT AGG GAT GCT GCA TCT 1734 
His Ser Glu Thr Lys Asn Ser Leu Met Gin Asp Arg Asp Ala Ala Ser 
230 235 240 

GCT CGG GCC TGG CCT AAA ATG CAC ACA GTC AAT GGT TAT GTA AAC AGG 17 82 
Ala Arg Ala Trp Pro Lys Met His Thr Val Asn Gly Tyr Val Asn Arg 
245 250 255 

TCT CTG CCA GGT CTG ATT GGA TGC CAC AGG AAA TCA GTC TAT TGG CAT 18 30 

Ser Leu Pro Gly Leu He Gly Cys His Arg Lys Ser Val Tyr Trp His 
260 265 270 275 

GTG ATT GGA ATG GGC ACC ACT CCT GAA GTG CAC TCA ATA TTC CTC GAA 187 8 

Val He Gly Met Gly Thr Thr Pro Glu Val His Ser He Phe Leu Glu 
280 285 290 

GGT CAC ACA TTT CTT GTG AGG AAC CAT CGC CAG GCG TCC TTG GAA ATC 192 6 

Gly His Thr Phe Leu Val Arg Asn His Arg Gin Ala Ser Leu Glu He 

295 300 305 

TCG CCA ATA ACT TTC CTT ACT GCT CAA ACA CTC TTG ATG GAC CTT GGA 197 4 
Ser Pro He Thr Phe Leu Thr Ala Gin Thr Leu Leu Met Asp Leu Gly 
310 315 320 

CAG TTT CTA CTG TTT TGT CAT ATC TCT TCC CAC CAA CAT GAT GGC ATG 2022 
Gin Phe Leu Leu Phe Cys His He Ser Ser His Gin His Asp Gly Met 
325 330 335 

GAA GCT TAT GTC AAA GTA GAC AGC TGT CCA GAG GAA CCC CAA CTA CGA 2070 
Glu Ala Tyr Val Lys Val Asp Ser Cys Pro Glu Glu Pro Gin Leu Arg 
340 345 350 355 

ATG AAA AAT AAT GAA GAA GCG GAA GAC TAT GAT GAT GAT CTT ACT GAT 2118 
Met Lys Asn Asn Glu Glu Ala Glu Asp Tyr Asp Asp Asp Leu Thr Asp 
360 365 370 

TCT GAA ATG GAT GTG GTC AGG TTT GAT GAT GAC AAC TCT CCT TCC TTT 2166 
Ser Glu Met Asp Val Val Arg Phe Asp Asp Asp Asn Ser Pro Ser Phe 
375 380 385 

ATC CAA ATT CGC TCA GTT GCC AAG AAG CAT CCT AAA ACT TGG GTA CAT 2214 
He Gin He Arg Ser Val Ala Lys Lys His Pro Lys Thr Trp Val His 
390 395 400 

TAC ATT GCT GCT GAA GAG GAG GAC TGG GAC TAT GCT CCC TTA GTC CTC 22 62 

Tyr He Ala Ala Glu Glu Glu Asp Trp Asp Tyr Ala Pro Leu Val Leu 
405 410 415 

GCC CCC GAT GAC AGA AGT TAT AAA AGT CAA TAT TTG AAC AAT GGC CCT 2310 
Ala Pro Asp Asp Arg Ser Tyr Lys Ser Gin Tyr Leu Asn Asn Gly Pro 
420 425 430 435 
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CAG CGG ATT GGT AGG AAG TAC AAA AAA GTC CGA TTT ATG GCA TAC ACA 2358 
Gin Arg lie Gly Arg Lys Tyr Lys Lys Val Arg Phe Met Ala Tyr Thr 
440 445 450 

GAT GAA ACC TTT AAG ACT CGT GAA GCT ATT CAG CAT GAA TCA GGA ATC 24 06 
Asp Glu Thr Phe Lys Thr Arg Glu Ala lie Gin His Glu Ser Gly lie 
455 460 465 

TTG GGA CCT TTA CTT TAT GGG GAA GTT GGA GAC ACA CTG TTG ATT ATA 24 54 
Leu Gly Pro Leu Leu Tyr Gly Glu Val Gly Asp Thr Leu Leu He He 
470 475 480 

TTT AAG AAT CAA GCA AGC AGA CCA TAT AAC ATC TAC CCT CAC GGA ATC 2502 
Phe Lys Asn Gin Ala Ser Arg Pro Tyr Asn He Tyr Pro His Gly He 
485 490 495 

ACT GAT GTC CGT CCT TTG TAT TCA AGG AGA TTA CCA AAA GGT GTA AAA 25 50 
Thr Asp Val Arg Pro Leu Tyr Ser Arg Arg Leu Pro Lys Gly Val Lys 
500 505 510 515 

CAT TTG AAG GAT TTT CCA ATT CTG CCA GGA GAA ATA TTC AAA TAT AAA 2598 
His Leu Lys Asp Phe Pro He Leu Pro Gly Glu He Phe Lys Tyr Lys 
520 525 530 

TGG ACA GTG ACT GTA GAA GAT GGG CCA ACT AAA TCA GAT CCT CGG TGC 2 64 6 

Trp Thr Val Thr Val Glu Asp Gly Pro Thr Lys Ser Asp Pro Arg Cys 
535 540 545 

CTG ACC CGC TAT TAC TCT AGT TTC GTT AAT ATG GAG AGA GAT CTA GCT 2 694 
Leu Thr Arg Tyr Tyr Ser Ser Phe Val Asn Met Glu Arg Asp Leu Ala 
550 555 560 

TCA GGA CTC ATT GGC CCT CTC CTC ATC TGC TAC AAA GAA TCT GTA GAT 274 2 

Ser Gly Leu He Gly Pro Leu Leu He Cys Tyr Lys Glu Ser Val Asp 
565 570 575 

CAA AGA GGA AAC CAG ATA ATG TCA GAC AAG AGG AAT GTC ATC CTG TTT 27 90 

Gin Arg Gly Asn Gin He Met Ser Asp Lys Arg Asn Val He Leu Phe 
580 585 590 595 

TCT GTA TTT GAT GAG AAC CGA AGC TGG TAC CTC ACA GAG AAT ATA CAA 2 838 
Ser Val Phe Asp Glu Asn Arg Ser Trp Tyr Leu Thr Glu Asn He Gin 
600 605 610 

CGC TTT CTC CCC AAT CCA GCT GGA GTG CAG CTT GAG GAT CCA GAG TTC 288 6 

Arg Phe Leu Pro Asn Pro Ala Gly Val Gin Leu Glu Asp Pro Glu Phe 
615 620 625 

CAA GCC TCC AAC ATC ATG CAC AGC ATC AAT GGC TAT GTT TTT GAT AGT 2 934 

Gin Ala Ser Asn He Met His Ser He Asn Gly Tyr Val Phe Asp Ser 
630 635 640 

TTG CAG TTG TCA GTT TGT TTG CAT GAG GTG GCA TAC TGG TAC ATT CTA 2 982 
Leu Gin Leu Ser Val Cys Leu His Glu Val Ala Tyr Trp Tyr He Leu 
645 650 655 
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AGC ATT GGA GCA CAG ACT GAC TTC CTT TCT GTC TTC TTC TCT GGA TAT 3030 
Ser lie Gly Ala Gin Thr Asp Phe Leu Ser Val Phe Phe Ser Gly Tyr 
660 665 670 675 

ACC TTC AAA CAC AAA ATG GTC TAT GAA GAC ACA CTC ACC CTA TTC CCA 307 8 
Thr Phe Lys His Lys Met Val Tyr Glu Asp Thr Leu Thr Leu Phe Pro 
680 685 690 

TTC TCA GGA GAA ACT GTC TTC ATG TCG ATG GAA AAC CCA GGT CTA TGG 312 6 
Phe Ser Gly Glu Thr Val Phe Met Ser Met Glu Asn Pro Gly Leu Trp 
695 700 ' 705 

ATT CTG GGG TGC CAC AAC TCA GAC TTT CGG AAC AGA GGC ATG ACC GCC 317 4 
lie Leu Gly Cys His Asn Ser Asp Phe Arg Asn Arg Gly Met Thr Ala 
710 715 720 

TTA CTG AAG GTT TCT AGT TGT GAC AAG AAC ACT GGT GAT TAT TAC GAG 3222 
Leu Leu Lys Val Ser Ser Cys Asp Lys Asn Thr Gly Asp Tyr Tyr Glu 
725 730 735 

GAC AGT TAT GAA GAT ATT TCA GCA TAC TTG CTG AGT AAA AAC AAT GCC 327 0 
Asp Ser Tyr Glu Asp lie Ser Ala Tyr Leu Leu Ser Lys Asn Asn Ala 
740 745 750 755 

ATT GAA CCA AGA AGC TTC TCC CAG GTAAGTTATT ATATAAATTC AAGACACCCT 332 4 
lie Glu Pro Arg Ser Phe Ser Gin 
760 

AGCACTAGGC AAAAGCAATT TAATGCCACC AC AAT T CC AG AAAATGACAT AGAGAAGACT 338 4 

GACCCTTGGT TTG CAC AC AG AACACCT AT G CCTAAAATAC AAAATGTCTC CTCTAGTGAT 3444 

TTGTTGATGC TCTTGCGACA GAGTCCTACT CCACATGGGC TATCCTTATC TGATCTCCAA 3504 

GAAGCCAAAT ATGAGACTTT TTCTGATGAT CCATCACCTG GAGCAATAGA CAGTAATAAC 3564 

AGCCTGTCTG AAAT GACAC A CTTCAGGCCA CAGCTCCATC ACAGTGGGGA CATGGTATTT 3624 

ACCCCTGAGT CAGGCCTCCA ATTAAGATTA AAT GAGAAAC TGGGGACAAC T GC AGC AAC A 368 4 

GAG TTG AAG A AACTTGATTT CAAAGTTTCT AGTACATCAA ATAATCTGAT T T C AAC AAT T 374 4 

CCATCAGACA ATTTGGCAGC AGGTACTGAT AATACAAGTT CCTTAGGACC CCCAAGTATG 3804 

CCAGTTCATT AT GAT AGT C A AT TAG AT ACC ACTCTATTTG GCAAAAAGTC ATCTCCCCTT 38 64 

ACTGAGTCTG GTGGACCTCT GAGCTTGAGT GAAGAAAATA ATGATTCAAA GTTGTTAGAA 3 924 

TCAGGTTTAA TGAATAGCCA AGAAAGTTCA TGGGGAAAAA ATGTATCGTC AAC AG AG AGT 398 4 

GGTAGGTTAT TTAAAGGGAA AAGAGCTCAT GGACCTGCTT TGTTGACTAA AGATAATGCC 4 04 4 

TTATTCAAAG TTAGCATCTC TTTGTTAAAG ACAAACAAAA CTTCCAATAA TTCAGCAACT 4104 

AATAGAAAGA CTCACATTGA TGGCCCATCA TTATTAATTG AGAATAGTCC ATCAGTCTGG 4164 

CAAAATATAT TAGAAAGTGA CACTGAGTTT AAAAAAGTGA CACCTTTGAT T CAT GAC AG A 4 224 
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ATGCTTATGG ACAAAAATGC TACAGCTTTG AGGCTAAATC ATATGTCAAA TAAAACTACT 428 4 

TCATCAAAAA ACATGGAAAT GGTCCAACAG AAAAAAGAGG GCCCCATTCC ACCAGATGCA 4 34 4 

CAAAATCCAG ATATGTCGTT CTTTAAGATG CTATTCTTGC CAGAATCAGC AAGGTGGATA 4 4 04 

CAAAGGACTC ATGGAAAGAA CTCTCTGAAC TCTGGGCAAG GCCCCAGTCC AAAGCAATTA 4 4 64 

GTATCCTTAG G AC C AG AAAA ATCTGTGGAA GGTCAGAATT TCTTGTCTGA GAAAAACAAA 4 524 

GTGGTAGTAG GAAAGGGTGA ATTTACAAAG GACGTAGGAC TCAAAGAGAT GGTTTTTCCA 4 584 

AGCAGCAGAA ACCTATTTCT TACTAACTTG GATAATTTAC ATGAAAATAA TACACACAAT 4 64 4 

CAAGAAAAAA AAATTCAGGA AGAAATAGAA AAG AAG G AAA CATTAATCCA AGAGAATGTA 4 7 04 

GTTTTGCCTC AGATACATAC AGTGACTGGC ACTAAGAATT TCATGAAGAA CCTTTTCTTA 4 7 64 

CTGAGCACTA GGCAAAATGT AGAAGGTTCA TATGAGGGGG CATATGCTCC AGTACTTCAA 4 824 

GATTTTAGGT CATTAAATGA TTCAACAAAT AGAACAAAGA AACACACAGC TCATTTCTCA 4 884 

AAAAAAGGGG AGGAAGAAAA CTTGGAAGGC TTGGGAAATC AAACCAAGCA AATTGTAGAG 4 94 4 

AAATATGCAT GCACCACAAG GATATCTCCT AATACAAGCC AGCAGAATTT TGTCACGCAA 5004 

CGTAGTAAGA GAGCTTTGAA ACAATTCAGA CTCCCACTAG AAGAAACAGA ACTTGAAAAA 50 64 

AGGATAATTG TGGATGACAC CTCAACCCAG TGGTCCAAAA ACATGAAACA TTTGACCCCG 512 4 

AGCACCCTCA CACAGATAGA CTACAATGAG AAG G AG AAAG GGGCCATTAC TCAGTCTCCC 5184 

T TAT C AG ATT GCCTTACGAG GAGTCATAGC ATCCCTCAAG CAAATAGATC TCCATTACCC 524 4 

ATTGCAAAGG TATCATCATT TCCATCTATT AGACCTATAT ATCTGACCAG GGTCCTATTC 5304 

CAAGACAACT CTTCTCATCT TCCAGCAGCA TCTTATAGAA AGAAAGATTC TGGGGTCCAA 53 64 

GAAAGCAGTC ATTTCTTACA AGGAGCCAAA AAAAATAACC TTTCTTTAGC CATTCTAACC 54 24 

TTGGAGATGA CTGGTGATCA AAGAGAGGTT GGCTCCCTGG GGACAAGTGC CACAAATTCA 5 484 

GTCACATACA AGAAAGTTGA GAACACTGTT CTCCCGAAAC CAGACTTGCC C AAAAC AT C T 554 4 

GGCAAAGTTG AATTGCTTCC AAAAGTTCAC ATTTATCAGA AGGACCTATT CCCTACGGAA 5 604 

ACTAGCAATG GGTCTCCTGG CCATCTGGAT CTCGTGGAAG GGAGCCTTCT T C AGGGAAC A 5 664 

GAGGGAGCGA TTAAGTGGAA TGAAGCAAAC AGACCTGGAA AAGTTCCCTT TCTGAGAGTA 572 4 

GCAACAGAAA GCTCTGCAAA GACTCCCTCC AAGCTATTGG ATCCTCTTGC TTGGGATAAC 57 84 

CACTATGGTA CTCAGATACC AAAAGAAGAG TGGAAATCCC AAGAGAAGTC AC C AG AAAAA 58 4 4 

ACAGCTTTTA AGAAAAAGGA TACCATTTTG TCCCTGAACG CTTGTGAAAG CAATCATGCA 5 904 

ATAGCAGCAA TAAATGAGGG ACAAAATAAG CCCGAAATAG AAGTCACCTG GGCAAAGCAA 5964 
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GGTAGGACTG AAAGGCTGTG CTCTCAATTG TGCTAATAAA GCTTGGCAAG AGTATTTCAA 6024 

GG AAGATGAA GTCATTAACT ATGCAAAATG CTTCTCAGGC ACCTAGGAAA ATGAGGATGT 608 4 

GAGGCATTTC TACCCACTTG GTACATAAAA TTATTGGGTC ACCCTTTTCC TCTTCTTTTT 614 4 

TTCTCCAG AAC CCA CCA GTC TTG AAA CGC CAT CAA CGG GAA ATA ACT CGT 6194 
Asn Pro Pro Val Leu Lys Arg His Gin Arg Glu lie Thr Arg 
1 5 10 

ACT ACT CTT CAG TCA GAT CAA GAG GAA ATT GAC TAT GAT GAT ACC ATA 6242 
Thr Thr Leu Gin Ser Asp Gin Glu Glu lie Asp Tyr Asp Asp Thr lie 
15 20 25 30 

TCA GTT GAA ATG AAG AAG GAA GAT TTT GAC ATT TAT GAT GAG GAT GAA 62 90 

Ser Val Glu Met Lys Lys Glu Asp Phe Asp lie Tyr Asp Glu Asp Glu 
35 40 45 

AAT CAG AGC CCC CGC AGC TTT CAA AAG AAA ACA CGA CAC TAT TTT ATT 6338 
Asn Gin Ser Pro Arg Ser Phe Gin Lys Lys Thr Arg His Tyr Phe lie 
50 55 60 

GCT GCA GTG GAG AGG CTC TGG GAT TAT GGG ATG AGT AGC TCC CCA CAT 638 6 

Ala Ala Val Glu Arg Leu Trp Asp Tyr Gly Met Ser Ser Ser Pro His 
65 70 75 

GTT CTA AGA AAC AGG GCT CAG AGT GGC AGT GTC CCT CAG TTC AAG AAA 6434 
Val Leu Arg Asn Arg Ala Gin Ser Gly Ser Val Pro Gin Phe Lys Lys 
80 85 90 

GTT GTT TTC CAG GAA TTT ACT GAT GGC TCC TTT ACT CAG CCC TTA TAC 64 82 

Val Val Phe Gin Glu Phe Thr Asp Gly Ser Phe Thr Gin Pro Leu Tyr 
95 100 105 110 

CGT GGA GAA CTA AAT GAA CAT TTG GGA CTC CTG GGG CCA TAT ATA AGA 6530 
Arg Gly Glu Leu Asn Glu His Leu Gly Leu Leu Gly Pro Tyr lie Arg 
115 120 125 

GCA GAA GTT GAA GAT AAT ATC ATG GTA ACT TTC AGA AAT CAG GCC TCT 657 8 

Ala Glu Val Glu Asp Asn lie Met Val Thr Phe Arg Asn Gin Ala Ser 
130 135 140 

CGT CCC TAT TCC TTC TAT TCT AGC CTT ATT TCT TAT GAG GAA GAT CAG 662 6 

Arg Pro Tyr Ser Phe Tyr Ser Ser Leu lie Ser Tyr Glu Glu Asp Gin 
145 150 155 

AGG CAA GGA GCA GAA CCT AGA AAA AAC TTT GTC AAG CCT AAT GAA ACC 6674 
Arg Gin Gly Ala Glu Pro Arg Lys Asn Phe Val Lys Pro Asn Glu Thr 
160 165 170 

AAA ACT TAC TTT TGG AAA GTG CAA CAT CAT ATG GCA CCC ACT AAA GAT 6722 
Lys Thr Tyr Phe Trp Lys Val Gin His His Met Ala Pro Thr Lys Asp 
175 180 185 190 

GAG TTT GAC TGC AAA GCC TGG GCT TAT TTC TCT GAT GTT GAC CTG GAA 67 7 0 

Glu Phe Asp Cys Lys Ala Trp Ala Tyr Phe Ser Asp Val Asp Leu Glu 
195 200 205 
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AAA GAT GTG 
Lys Asp Val 



AAC ACA CTG 
Asn Thr Leu 
225 

GCT CTG TTT 
Ala Leu Phe 
240 

GAA AAT ATG 
Glu Asn Met 
255 

GAT CCC ACT 
Asp Pro Thr 



ATA ATG GAT 
lie Met Asp 



CGA TGG TAT 
Arg Trp Tyr 
305 

CAT TTC AGT 
His Phe Ser 
320 

ATG GCA CTG 
Met Ala Leu 
335 

TTA CCA TCC 
Leu Pro Ser 



CAT CTA CAT 
His Leu His 



TGT CAG ACT 
Cys Gin Thr 
385 

ATT ACA GCT 
lie Thr Ala 
400 

CTT CAT TAT 
Leu His Tyr 
415 



CAC TCA GGC 
His Ser Gly 
210 

AAC CCT GCT 
Asn Pro Ala 



TTC ACC ATC 
Phe Thr He 



GAA AGA AAC 
Glu Arg Asn 
260 

TTT AAA GAG 
Phe Lys Glu 
275 

ACA CTA CCT 
Thr Leu Pro 

290 

CTG CTC AGC 
Leu Leu Ser 



GGA CAT GTG 
Gly His Val 



TAC AAT CTC 
Tyr Asn Leu 
340 

AAA GCT GGA 
Lys Ala Gly 
355 

GCT GGG ATG 
Ala Gly Met 
370 

CCC CTG GGA 
Pro Leu Gly 



TCA GGA CAA 
Ser Gly Gin 



TCC GGA TCA 
Ser Gly Ser 
420 



CTG ATT GGA 
Leu He Gly 
215 

CAT GGG AGA 
His Gly Arg 
230 

TTT GAT GAG 
Phe Asp Glu 
245 

TGC AGG GCT 
Cys Arg Ala 

AAT TAT CGC 
Asn Tyr Arg 

GGC TTA GTA 
Gly Leu Val 
295 

ATG GGC AGC 
Met Gly Ser 
310 

TTC ACT GTA 
Phe Thr Val 
325 

TAT CCA GGT 
Tyr Pro Gly 



ATT TGG CGG 
He Trp Arg 



AGC ACA CTT 
Ser Thr Leu 

375 

ATG GCT TCT 
Met Ala Ser 
390 

TAT GGA CAG 
Tyr Gly Gin 
405 

ATC AAT GCC 
He Asn Ala 



CCC CTT CTG 
Pro Leu Leu 



CAA GTG ACA 

Gin Val Thr 



ACC AAA AGC 
Thr Lys Ser 
250 

CCC TGC AAT 
Pro Cys Asn 
265 

TTC CAT GCA 
Phe His Ala 
280 

ATG GCT CAG 
Met Ala Gin 



AAT GAA AAC 
Asn Glu Asn 



CGA AAA AAA 
Arg Lys Lys 
330 

GTT TTT GAG 
Val Phe Glu 
345 

GTG GAA TGC 

Val Glu Cys 
360 

TTT CTG GTG 

Phe Leu Val 



GGA CAC ATT 
Gly His He 



TGG GCC CCA 
Trp Ala Pro 
410 

TGG AGC ACC 
Trp Ser Thr 
425 



GTC TGC CAC 
Val Cys His 
220 

GTA CAG GAA 
Val Gin Glu 
235 

TGG TAC TTC 
Trp Tyr Phe 



ATC CAG ATG 
He Gin Met 



ATC AAT GGC 
He Asn Gly 
285 

GAT CAA AGG 
Asp Gin Arg 
300 

ATC CAT TCT 
He His Ser 
315 

GAG GAG TAT 
Glu Glu Tyr 



ACA GTG GAA 
Thr Val Glu 



CTT ATT GGC 
Leu He Gly 
365 

TAC AGC AAT 
Tyr Ser Asn 
380 

AGA GAT TTT 
Arg Asp Phe 
395 

AAG CTG GCC 
Lys Leu Ala 



AAG GAG CCC 
Lys Glu Pro 



ACT 6818 
Thr 



TTT 6866 
Phe 



ACT 6914 
Thr 



GAA 6962 

Glu 

270 

TAC 7010 
Tyr 



ATT 7058 
He 



ATT 7106 
He 

AAA 7154 
Lys 

ATG 7202 

Met 

350 

GAG 7250 
Glu 



AAG 7298 
Lys 



CAG 7 34 6 
Gin 



AGA 7 394 
Arg 



TTT 7442 

Phe 

430 
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TCT TGG ATC 
Ser Trp lie 



AAG ACC CAG 
Lys Thr Gin 



TTT ATC ATC 
Phe He He 
465 

GGA AAT TCC 
Gly Asn Ser 
480 

TCT GGG ATA 
Ser Gly He 
495 

ATC CGT TTG 
He Arg Leu 



GAG TTG ATG 
Glu Leu Met 



GAG AGT AAA 
Glu Ser Lys 
545 

ACC AAT ATG 
Thr Asn Met 
560 

CAA GGG AGG 
Gin Gly Arg 
575 

TGG CTG CAA 
Trp Leu Gin 



ACT CAG GGA 
Thr Gin Gly 



CTC ATC TCC 
Leu He Ser 
625 

AAT GGC AAA 
Asn Gly Lys 
640 



AAG GTG GAT 
Lys Val Asp 
435 

GGT GCC CGT 
Gly Ala Arg 
450 

ATG TAT AGT 
Met Tyr Ser 



ACT GGA ACC 
Thr Gly Thr 

AAA CAC AAT 
Lys His Asn 
500 

CAC CCA ACT 
His Pro Thr 
515 

GGC TGT GAT 
Gly Cys Asp 
530 

GCA ATA TCA 
Ala He Ser 



TTT GCC ACC 
Phe Ala Thr 



AGT AAT GCC 
Ser Asn Ala 
580 

GTG GAC TTC 
Val Asp Phe 
595 

GTA AAA TCT 

Val Lys Ser 
610 

AGC AGT CAA 

Ser Ser Gin 



GTA AAG GTT 
Val Lys Val 



CTG TTG GCA 
Leu Leu Ala 



CAG AAG TTC 
Gin Lys Phe 
455 

CTT GAT GGG 
Leu Asp Gly 
470 

TTA ATG GTC 
Leu Met Val 
485 

ATT TTT AAC 
He Phe Asn 



CAT TAT AGC 
His Tyr Ser 



TTA AAT AGT 
Leu Asn Ser 
535 

GAT GCA CAG 
Asp Ala Gin 
550 

TGG TCT CCT 
Trp Ser Pro 
565 

TGG AGA CCT 
Trp Arg Pro 



CAG AAG ACA 
Gin Lys Thr 



CTG CTT ACC 
Leu Leu Thr 
615 

GAT GGC CAT 
Asp Gly His 
630 

TTT CAG GGA 
Phe Gin Gly 
645 



CCA ATG ATT 
Pro Met He 
440 

TCC AGC CTC 
Ser Ser Leu 



AAG AAG TGG 
Lys Lys Trp 



TTC TTT GGC 
Phe Phe Gly 
490 

CCT CCA ATT 
Pro Pro He 
505 

ATT CGC AGC 
He Arg Ser 
520 

TGC AGC ATG 
Cys Ser Met 



ATT ACT GCT 
He Thr Ala 



TCA AAA GCT 
Ser Lys Ala 
570 

CAG GTG AAT 
Gin Val Asn 
585 

ATG AAA GTC 
Met Lys Val 
600 

AGC ATG TAT 
Ser Met Tyr 



CAG TGG ACT 
Gin Trp Thr 



AAT CAA GAC 
Asn Gin Asp 
650 



ATT CAC GGC 
He His Gly 
445 

TAC ATC TCT 
Tyr He Ser 
460 

CAG ACT TAT 
Gin Thr Tyr 
475 

AAT GTG GAT 
Asn Val Asp 



ATT GCT CGA 
He Ala Arg 



ACT CTT CGC 
Thr Leu Arg 

525 

CCA TTG GGA 
Pro Leu Gly 
540 

TCA TCC TAC 
Ser Ser Tyr 
555 

CGA CTT CAC 
Arg Leu His 



AAT CCA AAA 
Asn Pro Lys 



ACA GGA GTA 
Thr Gly Val 
605 

GTG AAG GAG 
Val Lys Glu 
620 

CTC TTT TTT 
Leu Phe Phe 
635 

TCC TTC ACA 
Ser Phe Thr 



ATC 7 4 90 
He 



CAG 7538 
Gin 



CGA 7586 
Arg 



TCA 7634 
Ser 



TAC 7 682 

Tyr 

510 

ATG 7730 
Met 



ATG 7778 
Met 



TTT 7826 
Phe 



CTC 7874 
Leu 



GAG 7 922 

Glu 

590 

ACT 7 97 0 
Thr 



TTC 8018 
Phe 



CAG 8066 
Gin 



CCT 8114 
Pro 
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GTG GTG AAC TCT CTA GAC CCA CCG TTA CTG ACT CGC TAG CTT CGA ATT 8162 
Val Val Asn Ser Leu Asp Pro Pro Leu Leu Thr Arg Tyr Leu Arg lie 
655 660 665 670 

CAC CCC CAG AGT TGG GTG CAC CAG ATT GCC CTG AGG ATG GAG GTT CTG 8210 
His Pro Gin Ser Trp Val His Gin lie Ala Leu Arg Met Glu Val Leu 
675 680 685 

GGC TGC GAG GCA CAG GAC CTC TAC TGAGGGTGGC CACTGCAGCA CCTGCCACTG 82 64 
Gly Cys Glu Ala Gin Asp Leu Tyr 
690 

CCGTCACCTC TCCCTCCTCA GCTCCAGGGC AGTGTCCCTC CCTGGCTTGC CTTCTACCTT 8324 

TGTGCTAAAT CCTAGCAGAC ACTGCCTTGA AGCCTCCTGA ATTAACTATC ATCAGTCCTG 8384 

CATTTCTTTG GTGGGGGGCC AGGAGGGTGC ATCCAATTTA ACTTAACTCT TACCTATTTT 84 4 4 

CTGCAGCTGC TCCCAGATTA CTCCTTCCTT CCAATATAAC TAGGCAAAAA GAAGTGAGGA 8 50 4 

GAAACCTGCA TGAAAGCATT CTTCCCTGAA AAGTTAGGCC TCTCAGAGTC ACCACTTCCT 8 564 

CTGTTGTAGA AAAACTATGT GATGAAACTT T G AAAAAG AT ATTTATGATG TTAACTTGTT 8 62 4 

TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC ACAAATTTCA CAAATAAAGC 8 68 4 

ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC ATCAATGTAT CTTATCATGT 8 74 4 

CTGGATCCCC GGGTGGCATC CCTGTGACCC CTCCCCAGTG CCTCTCCTGG CCCTGGAAGT 8 8 04 

TGCCACTCCA GTGCCCACCA GCCTTGTCCT AATAAAATTA AGTTGCATCA TTTTGTCTGA 8 8 64 

CTAGGTGTCC TTCTATAATA TTATGGGGTG GAGGGGGGTG GTATGGAGCA AGGGGCAAGT 8 924 

TGGGAAGACA ACCTGTAGGG CCTGCGGGGT CTATTCGGGA ACCAAGCTGG AGTGCAGTGG 8 984 

CACAATCTTG GCTCACTGCA ATCTCCGCCT CCTGGGTTCA AGCGATTCTC CTGCCTCAGC 9044 

CTCCCGAGTT GTTGGGATTC CAGGCATGCA TGACCAGGCT CAGCTAATTT TTGTTTTTTT 9104 

GGTAGAGACG GGGTTTCACC ATATTGGCCA GGCTGGTCTC CAACTCCTAA TCTCAGGTGA 9164 

TCTACCCACC TTGGCCTCCC AAATTGCTGG GATTACAGGC GTGAACCACT GCTCCCTTCC 9224 

CTGTCCTTCT GATTTTAAAA TAACTATACC AGCAGGAGGA CGTCCAGACA CAGCATAGGC 928 4 

TACCTGCCAT GCCCAACCGG TGGGACATTT GAGTTGCTTG CTTGGCACTG TCCTCTCATG 934 4 

CGTTGGGTCC ACTCAGTAGA TGCCTGTTGA ATTCGTAATC ATGGTCATAG CTGTTTCCTG 94 04 

TGTGAAATTG TTATCCGCTC ACAATTCCAC AC AAC AT AC G AGCCGGAAGC ATAAAGTGTA 94 64 

AAGCCTGGGG TGCCTAATGA GTGAGCTAAC TCACATTAAT TGCGTTGCGC TCACTGCCCG 9524 

CTTTCCAGTC GGGAAACCTG TCGTGCCAGC TGCATTAATG AATCGGCCAA CGCGCGGGGA 9584 

GAGGCGGTTT GCGTATTGGG CGCTCTTCCG CTTCCTCGCT CACTGACTCG CTGCGCTCGG 964 4 
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TCGTTCGGCT GCGGCGAGCG GTATCAGCTC ACTCAAAGGC GGTAATACGG TTATCCACAG 9704 
AATCAGGGGA TAACGCAGGA AAGAACATGT GAGCAAAAGG CCAGCAAAAG GCCAGGAACC 97 64 
GTAAAAAGGC CGCGTTGCTG GCGTTTTTCC ATAGGCTCCG CCCCCCTGAC GAGCATCACA 9824 
AAAATCGACG CTCAAGTCAG AGGTGGCGAA ACCCGACAGG ACTATAAAGA TACCAGGCGT 98 8 4 
TTCCCCCTGG AAGCTCCCTC GTGCGCTCTC CTGTTCCGAC CCTGCCGCTT ACCGGATACC 9 94 4 
TGTCCGCCTT TCTCCCTTCG GGAAGCGTGG CGCTTTCTCA TAGCTCACGC TGTAGGTATC 10 00 4 
TCAGTTCGGT GTAGGTCGTT CGCTCCAAGC TGGGCTGTGT GCACGAACCC CCCGTTCAGC 10064 
CCGACCGCTG CGCCTTATCC GGTAACTATC GTCTTGAGTC CAACCCGGTA AGACACGACT 10124 
TATCGCCACT GGCAGCAGCC ACTGGTAACA GGATTAGCAG AG CGAGG TAT GTAGGCGGTG 10184 
CTACAGAGTT CTTGAAGTGG TGGCCTAACT ACGGCTACAC TAGAAGGACA GTATTTGGTA 10244 
TCTGCGCTCT GCTGAAGCCA GTTACCTTCG GAAAAAGAGT TGGTAGCTCT TGATCCGGCA 10304 
AACAAACCAC CGCTGGTAGC GGTGGTTTTT TTGTTTGCAA GCAGCAGATT ACGCGCAGAA 10364 
AAAAAGGAT C TCAAGAAGAT CCTTTGATCT TTTCTACGGG GTCTGACGCT CAGTGGAACG 10 424 
AAAACTCACG TTAAGGGATT TTGGTCATGA GATTATCAAA AAGGATCTTC ACCTAGATCC 10484 
TTTTAAATTA AAAATGAAGT TTTAAATCAA TCTAAAGTAT ATATGAGTAA ACTTGGTCTG 1054 4 
ACAGTTACCA ATGCTTAATC AGTGAGGCAC CTATCTCAGC GATCTGTCTA TTTCGTTCAT 10604 
CCATAGTTGC CTGACTCCCC GTCGTGTAGA TAACTACGAT ACGGGAGGGC TTACCATCTG 10 664 
GCCCCAGTGC TGCAATGATA CCGCGAGACC CACGCTCACC GGCTCCAGAT TTATCAGCAA 10724 
TAAACCAGCC AGCCGGAAGG GCCGAGCGCA GAAGTGGTCC TGCAACTTTA TCCGCCTCCA 10784 
TCCAGTCTAT TAATTGTTGC CGGGAAGCTA GAGTAAGTAG TTCGCCAGTT AATAGTTTGC 10844 
GCAACGTTGT TGCCATTGCT ACAGGCATCG TGGTGTCACG CTCGTCGTTT GGTATGGCTT 10904 
CATTCAGCTC CGGTTCCCAA CGATCAAGGC GAGTTACATG ATCCCCCATG TTGTGCAAAA 10 964 
AAGCGGTTAG CTCCTTCGGT CCTCCGATCG TTGTCAGAAG TAAGTTGGCC GCAGTGTTAT 1102 4 
CACTCATGGT TATGGCAGCA CTGCATAATT CTCTTACTGT CATGCCATCC GTAAGATGCT 11084 
TTTCTGTGAC TGGTGAGTAC TCAACCAAGT CATTCTGAGA ATAGTGTATG CGGCGACCGA 1114 4 
GTTGCTCTTG CCCGGCGTCA ATACGGGATA ATACCGCGCC ACATAGCAGA ACTTTAAAAG 11204 
TGCTCATCAT TGGAAAACGT TCTTCGGGGC GAAAACTCTC AAGGATCTTA CCGCTGTTGA 112 64 
GATCCAGTTC GATGTAACCC ACTCGTGCAC CCAACTGATC TTCAGCATCT TTTACTTTCA 11324 
CCAGCGTTTC TGGGTGAGCA AAAACAGGAA GGCAAAATGC CGCAAAAAAG GGAATAAGGG 11384 
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CGACACGGAA ATGTTGAATA CTCATACTCT TCCTTTTTCA ATATTATTGA AGCATTTATC 114 4 4 
AGGGTTATTG TCTCATGAGC GGATACATAT TTGAATGTAT TTAGAAAAAT AAACAAATAG 11504 
GGGTTCCGCG CACATTTCCC CGAAAAGTGC CACCTGACGT CTAAGAAACC ATTATTATCA 11564 
TGACATTAAC CTATAAAAAT AGGCGTATCA CGAGGCCCTT TCGTCTCGCG CGTTTCGGTG 11624 
ATGACGGTGA AAACCTCTGA CACATGCAGC TCCCGGAGAC GGTCACAGCT TGTCTGTAAG 11684 
CGGATGCCGG GAGCAGACAA GCCCGTCAGG GCGCGTCAGC GGGTGTTGGC GGGTGTCGGG 11744 
GCTGGCTTAA CTATGCGGCA TCAGAGCAGA TTGTACTGAG AGTGCACCAT ATGCGGTGTG 11804 
AAATACCGCA CAGATGCGTA AG G AG AAAAT ACCGCATCAG GCGCCATTCG CCATTCAGGC 11864 
TGCGCAACTG TTGGGAAGGG CGATCGGTGC GGGCCTCTTC GCTATTACGC CAGCTGGCGA 11924 
AAGGGGGATG TGCTGCAAGG CGATTAAGTT GGGTAACGCC AGGGTTTTCC CAGTCACGAC 11984 
GTTGTAAAAC GACGGCCAGT GCCAAGCTTG GGCTGCAG 12022 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11846 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1006.. 8058 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GTCGACGGTA TCGATAAGCT TGATATCGAA TTCCTGCAGC CCGGGGGATC CACTAGTACT 60 

CGAGACCTAG GAGTTAATTT TTAAAAAGCA GTCAAAAGTC CAAGTGGCCC TTGCGAGCAT 120 

TTACTCTCTC TGTTTGCTCT GGTTAATAAT CTCAGGAGCA CAAACATTCC TTACTAGTCC 180 

TAGAAGTTAA TTTTTAAAAA GCAGTCAAAA GTCCAAGTGG CCCTTGCGAG CATTTACTCT 24 0 

CTCTGTTTGC TCTGGTTAAT AATCTCAGGA GCACAAACAT TCCTTACTAG TTCTAGAGCG 300 

GCCGCCAGTG TGCTGGAATT CGGCTTTTTT AGGGCTGGAA GCTACCTTTG ACATCATTTC 360 

CTCTGCGAAT GCATGTATAA TTTCTACAGA ACCTATTAGA AAGGATCACC CAGCCTCTGC 420 

TTTTGTACAA CTTTCCCTTA AAAAACTGCC AATTCCACTG CTGTTTGGCC CAATAGTGAG 480 

AACTTTTTCC TGCTGCCTCT TGGTGCTTTT GCCTATGGCC CCTATTCTGC CTGCTGAAGA 54 0 

CACTCTTGCC AGCATGGACT TAAACCCCTC CAGCTCTGAC AATCCTCTTT CTCTTTTGTT 600 
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TTACATGAAG GGTCTGGCAG CCAAAGCAAT CACTCAAAGT TCAAACCTTA TCATTTTTTG 660 

CTTTGTTCCT CTTGGCCTTG GTTTTGTACA TCAGCTTTGA AAATACCATC CCAGGGTTAA 720 

TGCTGGGGTT AATTTATAAC TAAGAGTGCT CTAGTTTTGC AATACAGGAC ATGCTATAAA 780 

AATGGAAAGA TGTTGCTTTC TGAGAGATCT CGAGGAAGCT AACAACAAAG AACAACAAAC 8 40 

AACAATCAGG TAAGTATCCT TTTTACAGCA CAACTTAATG AGACAGATAG AAACTGGTCT 900 

TGTAGAAACA GAGTAGTCGC CTGCTTTTCT GCCAGGTGCT GACTTCTCTC CCCTTCTCTT 960 

TTTTCCTTTT CTCAGGATAA CAAGAACGAA ACAATAACAG CCACC ATG GAA ATA 1014 

Met Glu lie 
1 

GAG CTC TCC ACC TGC TTC TTT CTG TGC CTT TTG CGA TTC TGC TTT AGT 1062 
Glu Leu Ser Thr Cys Phe Phe Leu Cys Leu Leu Arg Phe Cys Phe Ser 
5 10 15 

GCC ACC AGA AGA TAC TAC CTG GGT GCA GTG GAA CTG TCA TGG GAC TAT 1110 
Ala Thr Arg Arg Tyr Tyr Leu Gly Ala Val Glu Leu Ser Trp Asp Tyr 

20 25 30 35 

ATG CAA AGT GAT CTC GGT GAG CTG CCT GTG GAC GCA AGA TTT CCT CCT 1158 
Met Gin Ser Asp Leu Gly Glu Leu Pro Val Asp Ala Arg Phe Pro Pro 
40 45 50 

AGA GTG CCA AAA TCT TTT CCA TTC AAC ACC TCA GTC GTG TAC AAA AAG 1206 
Arg Val Pro Lys Ser Phe Pro Phe Asn Thr Ser Val Val Tyr Lys Lys 
55 60 65 

ACT CTG TTT GTA GAA TTC ACG GTT CAC CTT TTC AAC ATC GCT AAG CCA 1254 
Thr Leu Phe Val Glu Phe Thr Val His Leu Phe Asn lie Ala Lys Pro 
70 75 80 

AGG CCA CCC TGG ATG GGT CTG CTA GGT CCT ACC ATC CAG GCT GAG GTT 1302 
Arg Pro Pro Trp Met Gly Leu Leu Gly Pro Thr lie Gin Ala Glu Val 
85 90 95 

TAT GAT ACA GTG GTC ATT ACA CTT AAG AAC ATG GCT TCC CAT CCT GTC 1350 
Tyr Asp Thr Val Val lie Thr Leu Lys Asn Met Ala Ser His Pro Val 
100 105 110 115 

AGT CTT CAT GCT GTT GGT GTA TCC TAC TGG AAA GCT TCT GAG GGA GCT 1398 
Ser Leu His Ala Val Gly Val Ser Tyr Trp Lys Ala Ser Glu Gly Ala 
120 125 130 

GAA TAT GAT GAT CAG ACC AGT CAA AGG GAG AAA GAA GAT GAT AAA GTC 14 4 6 

Glu Tyr Asp Asp Gin Thr Ser Gin Arg Glu Lys Glu Asp Asp Lys Val 
135 140 145 

TTC CCT GGT GGA AGC CAT ACA TAT GTC TGG CAG GTC CTG AAA GAG AAT 14 94 

Phe Pro Gly Gly Ser His Thr Tyr Val Trp Gin Val Leu Lys Glu Asn 
150 155 160 
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GGT CCA ATG 
Gly Pro Met 
165 

CAT GTG GAC 
His Val Asp 
180 

CTA GTA TGT 
Leu Val Cys 



CAC AAA TTT 
His Lys Phe 



CAC TCA GAA 
His Ser Glu 
230 

GCT CGG GCC 
Ala Arg Ala 
245 

TCT CTG CCA 
Ser Leu Pro 
260 

GTG ATT GGA 
Val He Gly 



GGT CAC ACA 
Gly His Thr 



TCG CCA ATA 
Ser Pro He 
310 

CAG TTT CTA 
Gin Phe Leu 
325 

GAA GCT TAT 
Glu Ala Tyr 
340 

ATG AAA AAT 
Met Lys Asn 



TCT GAA ATG 
Ser Glu Met 



GCC TCT GAC 
Ala Ser Asp 



CTG GTA AAA 
Leu Val Lys 
185 

AGA GAA GGG 
Arg Glu Gly 

200 

ATA CTA CTT 
lie Leu Leu 
215 

ACA AAG AAC 
Thr Lys Asn 



TGG CCT AAA 
Trp Pro Lys 

GGT CTG ATT 
Gly Leu He 
265 

ATG GGC ACC 
Met Gly Thr 
280 

TTT CTT GTG 
Phe Leu Val 
295 

ACT TTC CTT 
Thr Phe Leu 



CTG TTT TGT 
Leu Phe Cys 



GTC AAA GTA 
Val Lys Val 
345 

AAT GAA GAA 
Asn Glu Glu 
360 

GAT GTG GTC 
Asp Val Val 
375 



CCA CTG TGC 
Pro Leu Cys 
170 

GAC TTG AAT 
Asp Leu Asn 



AGT CTG GCC 
Ser Leu Ala 



TTT GCT GTA 
Phe Ala Val 
220 

TCC TTG ATG 
Ser Leu Met 
235 

ATG CAC ACA 
Met His Thr 

250 

GGA TGC CAC 
Gly Cys His 

ACT CCT GAA 
Thr Pro Glu 



AGG AAC CAT 
Arg Asn His 
300 

ACT GCT CAA 
Thr Ala Gin 
315 

CAT ATC TCT 
His lie Ser 
330 

GAC AGC TGT 
Asp Ser Cys 



GCG GAA GAC 
Ala Glu Asp 



AGG TTT GAT 
Arg Phe Asp 
380 



CTT ACC TAC 
Leu Thr Tyr 
175 

TCA GGC CTC 
Ser Gly Leu 
190 

AAG GAA AAG 
Lys Glu Lys 
205 

TTT GAT GAA 
Phe Asp Glu 



CAG GAT AGG 
Gin Asp Arg 



GTC AAT GGT 
Val Asn Gly 
255 

AGG AAA TCA 
Arg Lys Ser 
270 

GTG CAC TCA 
Val His Ser 
285 

CGC CAG GCG 
Arg Gin Ala 

ACA CTC TTG 
Thr Leu Leu 



TCC CAC CAA 
Ser His Gin 
335 

CCA GAG GAA 
Pro Glu Glu 
350 

TAT GAT GAT 
Tyr Asp Asp 
365 

GAT GAC AAC 
Asp Asp Asn 



TCA TAT CTT 
Ser Tyr Leu 



ATT GGA GCC 
lie Gly Ala 

ACA CAG ACC 
Thr Gin Thr 

210 

GGG AAA AGT 
Gly Lys Ser 
225 

GAT GCT GCA 
Asp Ala Ala 
240 

TAT GTA AAC 
Tyr Val Asn 



GTC TAT TGG 
Val Tyr Trp 

ATA TTC CTC 
He Phe Leu 
290 

TCC TTG GAA 
Ser Leu Glu 

305 

ATG GAC CTT 
Met Asp Leu 
320 

CAT GAT GGC 
His Asp Gly 



CCC CAA CTA 
Pro Gin Leu 



GAT CTT ACT 
Asp Leu Thr 
370 

TCT CCT TCC 
Ser Pro Ser 
385 



TCT 1542 
Ser 



CTA 1590 

Leu 

195 

TTG 1638 
Leu 



TGG 1686 
Trp 



TCT 1734 
Ser 



AGG 1782 
Arg 

CAT 1830 

His 

275 

GAA 1878 
Glu 



ATC 192 6 

He 



GGA 197 4 
Gly 



ATG 2022 
Met 



CGA 2 07 0 

Arg 

355 

GAT 2118 
Asp 



TTT 2166 
Phe 
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ATC CAA ATT 
He Gin He 
390 

TAC ATT GCT 
Tyr He Ala 
405 

GCC CCC GAT 
Ala Pro Asp 
420 

CAG CGG ATT 
Gin Arg He 



GAT GAA ACC 
Asp Glu Thr 



TTG GGA CCT 
Leu Gly Pro 
470 

TTT AAG AAT 
Phe Lys Asn 
485 

ACT GAT GTC 
Thr Asp Val 
500 

CAT TTG AAG 
His Leu Lys 

TGG AC A GTG 
Trp Thr Val 



CTG ACC CGC 
Leu Thr Arg 
550 

TCA GGA CTC 
Ser Gly Leu 
565 

CAA AGA GGA 
Gin Arg Gly 

580 

TCT GTA TTT 
Ser Val Phe 



CGC TCA GTT 
Arg Ser Val 



GCT GAA GAG 

Ala Glu Glu 



GAC AGA AGT 
Asp Arg Ser 
425 

GGT AGG AAG 
Gly Arg Lys 
440 

TTT AAG ACT 
Phe Lys Thr 
455 

TTA CTT TAT 
Leu Leu Tyr 

CAA GCA AGC 
Gin Ala Ser 



CGT CCT TTG 
Arg Pro Leu 
505 

GAT TTT CCA 
Asp Phe Pro 
520 

ACT GTA GAA 
Thr Val Glu 
535 

TAT TAC TCT 
Tyr Tyr Ser 



ATT GGC CCT 
He Gly Pro 



AAC CAG ATA 
Asn Gin He 
585 

GAT GAG AAC 
Asp Glu Asn 
600 



GCC AAG AAG 
Ala Lys Lys 
395 

GAG GAC TGG 
Glu Asp Trp 
410 

TAT AAA AGT 
Tyr Lys Ser 



TAC AAA AAA 
Tyr Lys Lys 



CGT GAA GCT 
Arg Glu Ala 
460 

GGG GAA GTT 
Gly Glu Val 
475 

AGA CCA TAT 
Arg Pro Tyr 
490 

TAT TCA AGG 
Tyr Ser Arg 

ATT CTG CCA 
He Leu Pro 



GAT GGG CCA 
Asp Gly Pro 
540 

AGT TTC GTT 
Ser Phe Val 
555 

CTC CTC ATC 
Leu Leu He 
570 

ATG TCA GAC 
Met Ser Asp 



CGA AGC TGG 
Arg Ser Trp 



CAT CCT AAA 
His Pro Lys 



GAC TAT GCT 
Asp Tyr Ala 
415 

CAA TAT TTG 
Gin Tyr Leu 
430 

GTC CGA TTT 
Val Arg Phe 
445 

ATT CAG CAT 
He Gin His 



GGA GAC ACA 
Gly Asp Thr 



AAC ATC TAC 
Asn He Tyr 
495 

AGA TTA CCA 
Arg Leu Pro 
510 

GGA GAA ATA 
Gly Glu He 
525 

ACT AAA TCA 
Thr Lys Ser 



AAT ATG GAG 
Asn Met Glu 



TGC TAC AAA 
Cys Tyr Lys 
575 

AAG AGG AAT 
Lys Arg Asn 
590 

TAC CTC ACA 
Tyr Leu Thr 
605 



ACT TGG GTA 
Thr Trp Val 
400 

CCC TTA GTC 
Pro Leu Val 



AAC AAT GGC 
Asn Asn Gly 



ATG GCA TAC 
Met Ala Tyr 
450 

GAA TCA GGA 
Glu Ser Gly 
465 

CTG TTG ATT 
Leu Leu He 
480 

CCT CAC GGA 
Pro His Gly 



AAA GGT GTA 
Lys Gly Val 



TTC AAA TAT 
Phe Lys Tyr 
530 

GAT CCT CGG 
Asp Pro Arg 
545 

AGA GAT CTA 
Arg Asp Leu 
560 

GAA TCT GTA 
Glu Ser Val 



GTC ATC CTG 
Val He Leu 



GAG AAT ATA 
Glu Asn He 
610 



CAT 2214 
His 



CTC 2262 
Leu 



CCT 2310 

Pro 

435 

ACA 2 358 
Thr 



ATC 2406 
He 



ATA 2454 
He 



ATC 2502 
He 



AAA 2550 
Lys 

515 

AAA 2598 
Lys 

TGC 2 64 6 

Cys 



GCT 2694 
Ala 



GAT 27 4 2 

Asp 



TTT 2790 

Phe 

595 

CAA 2838 
Gin 
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CGC TTT CTC 
Arg Phe Leu 



CAA GCC TCC 
Gin Ala Ser 
630 

TTG CAG TTG 
Leu Gin Leu 
645 

AGC ATT GGA 
Ser lie Gly 
660 

ACC TTC AAA 
Thr Phe Lys 



TTC TCA GGA 
Phe Ser Gly 



ATT CTG GGG 
lie Leu Gly 
710 

TTA CTG AAG 
Leu Leu Lys 
725 

GAC AGT TAT 

Asp Ser Tyr 
740 

ATT GAA CCA 

lie Glu Pro 



CAA AAG CAA 
Gin Lys Gin 



ACT GAC CCT 
Thr Asp Pro 
790 

GTC TCC TCT 
Val Ser Ser 
805 

CAT GGG CTA 
His Gly Leu 
820 



CCC AAT CCA 
Pro Asn Pro 
615 

AAC ATC ATG 
Asn lie Met 



TCA GTT TGT 
Ser Val Cys 



GCA CAG ACT 
Ala Gin Thr 
665 

CAC AAA ATG 
His Lys Met 
680 

GAA ACT GTC 
Glu Thr Val 
695 

TGC CAC AAC 
Cys His Asn 



GTT TCT AGT 
Val Ser Ser 



GAA GAT ATT 
Glu Asp lie 
745 

AGA AGC TTC 
Arg Ser Phe 
760 

TTT AAT GCC 
Phe Asn Ala 
775 

TGG TTT GCA 
Trp Phe Ala 

AGT GAT TTG 
Ser Asp Leu 



TCC TTA TCT 
Ser Leu Ser 
825 



GCT GGA GTG 
Ala Gly Val 
620 

CAC AGC ATC 
His Ser lie 
635 

TTG CAT GAG 
Leu His Glu 
650 

GAC TTC CTT 
Asp Phe Leu 



GTC TAT GAA 
Val Tyr Glu 



TTC ATG TCG 
Phe Met Ser 
700 

TCA GAC TTT 
Ser Asp Phe 
715 

TGT GAC AAG 
Cys Asp Lys 
730 

TCA GCA TAC 
Ser Ala Tyr 



TCC CAG AAT 
Ser Gin Asn 



ACC ACA ATT 
Thr Thr lie 
780 

CAC AGA ACA 
His Arg Thr 
795 

TTG ATG CTC 
Leu Met Leu 
810 

GAT CTC CAA 
Asp Leu Gin 



CAG CTT GAG 
Gin Leu Glu 



AAT GGC TAT 
Asn Gly Tyr 



GTG GCA TAC 
Val Ala Tyr 
655 

TCT GTC TTC 
Ser Val Phe 
670 

GAC ACA CTC 
Asp Thr Leu 
685 

ATG GAA AAC 
Met Glu Asn 



CGG AAC AGA 
Arg Asn Arg 



AAC ACT GGT 
Asn Thr Gly 
735 

TTG CTG AGT 
Leu Leu Ser 
750 

TCA AGA CAC 
Ser Arg His 
765 

CCA GAA AAT 
Pro Glu Asn 



CCT ATG CCT 
Pro Met Pro 



TTG CGA CAG 
Leu Arg Gin 
815 

GAA GCC AAA 
Glu Ala Lys 
830 



GAT CCA GAG 
Asp Pro Glu 
625 

GTT TTT GAT 
Val Phe Asp 
640 

TGG TAC ATT 
Trp Tyr lie 



TTC TCT GGA 
Phe Ser Gly 



ACC CTA TTC 
Thr Leu Phe 
690 

CCA GGT CTA 
Pro Gly Leu 
705 

GGC ATG ACC 
Gly Met Thr 
720 

GAT TAT TAC 
Asp Tyr Tyr 

AAA AAC AAT 
Lys Asn Asn 

CCT AGC ACT 
Pro Ser Thr 
770 

GAC ATA GAG 
Asp lie Glu 
785 

AAA ATA CAA 
Lys lie Gin 
800 

AGT CCT ACT 
Ser Pro Thr 



TAT GAG ACT 
Tyr Glu Thr 



TTC 2886 
Phe 



AGT 2934 
Ser 



CTA 2982 
Leu 

TAT 3030 

Tyr 

675 

CCA 3078 
Pro 



TGG 3126 
Trp 



GCC 3174 
Ala 



GAG 3222 
Glu 



GCC 3270 

Ala 

755 

AGG 3318 
Arg 

AAG 3366 
Lys 

AAT 3414 
Asn 



CCA 34 62 
Pro 



TTT 3510 

Phe 

835 
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TCT GAT GAT CCA TCA CCT GGA GCA ATA GAC AGT AAT AAC AGC CTG TCT 3558 

Ser Asp Asp Pro Ser Pro Gly Ala lie Asp Ser Asn Asn Ser Leu Ser 

840 845 850 

GAA ATG ACA CAC TTC AGG CCA CAG CTC CAT CAC AGT GGG GAC ATG GTA 3606 

Glu Met Thr His Phe Arg Pro Gin Leu His His Ser Gly Asp Met Val 
855 860 865 

TTT ACC CCT GAG TCA GGC CTC CAA TTA AGA TTA AAT GAG AAA CTG GGG 3 654 

Phe Thr Pro Glu Ser Gly Leu Gin Leu Arg Leu Asn Glu Lys Leu Gly 
870 875 880 

ACA ACT GCA GCA ACA GAG TTG AAG AAA CTT GAT TTC AAA GTT TCT AGT 37 02 

Thr Thr Ala Ala Thr Glu Leu Lys Lys Leu Asp Phe Lys Val Ser Ser 
885 890 895 

ACA TCA AAT AAT CTG ATT TCA ACA ATT CCA TCA GAC AAT TTG GCA GCA 37 50 

Thr Ser Asn Asn Leu lie Ser Thr lie Pro Ser Asp Asn Leu Ala Ala 
900 905 910 915 

GGT ACT GAT AAT ACA AGT TCC TTA GGA CCC CCA AGT ATG CCA GTT CAT 37 98 

Gly Thr Asp Asn Thr Ser Ser Leu Gly Pro Pro Ser Met Pro Val His 

920 925 930 

TAT GAT AGT CAA TTA GAT ACC ACT CTA TTT GGC AAA AAG TCA TCT CCC 384 6 

Tyr Asp Ser Gin Leu Asp Thr Thr Leu Phe Gly Lys Lys Ser Ser Pro 
935 940 945 

CTT ACT GAG TCT GGT GGA CCT CTG AGC TTG AGT GAA GAA AAT AAT GAT 38 94 

Leu Thr Glu Ser Gly Gly Pro Leu Ser Leu Ser Glu Glu Asn Asn Asp 
950 955 960 

TCA AAG TTG TTA GAA TCA GGT TTA ATG AAT AGC CAA GAA AGT TCA TGG 394 2 

Ser Lys Leu Leu Glu Ser Gly Leu Met Asn Ser Gin Glu Ser Ser Trp 
965 970 975 

GGA AAA AAT GTA TCG TCA ACA GAG AGT GGT AGG TTA TTT AAA GGG AAA 3990 

Gly Lys Asn Val Ser Ser Thr Glu Ser Gly Arg Leu Phe Lys Gly Lys 
980 985 990 995 

AGA GCT CAT GGA CCT GCT TTG TTG ACT AAA GAT AAT GCC TTA TTC AAA 4 038 

Arg Ala His Gly Pro Ala Leu Leu Thr Lys Asp Asn Ala Leu Phe Lys 

1000 1005 1010 

GTT AGC ATC TCT TTG TTA AAG ACA AAC AAA ACT TCC AAT AAT TCA GCA 4 08 6 

Val Ser lie Ser Leu Leu Lys Thr Asn Lys Thr Ser Asn Asn Ser Ala 
1015 1020 1025 

ACT AAT AGA AAG ACT CAC ATT GAT GGC CCA TCA TTA TTA ATT GAG AAT 4134 
Thr Asn Arg Lys Thr His lie Asp Gly Pro Ser Leu Leu lie Glu Asn 
1030 1035 1040 

AGT CCA TCA GTC TGG CAA AAT ATA TTA GAA AGT GAC ACT GAG TTT AAA 4182 
Ser Pro Ser Val Trp Gin Asn lie Leu Glu Ser Asp Thr Glu Phe Lys 
1045 1050 1055 
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AAA GTG ACA CCT TTG ATT CAT GAC AGA ATG CTT ATG GAC AAA AAT GCT 4 230 

Lys Val Thr Pro Leu lie His Asp Arg Met Leu Met Asp Lys Asn Ala 
1060 1065 1070 1075 

ACA GCT TTG AGG CTA AAT CAT ATG TCA AAT AAA ACT ACT TCA TCA AAA 4 278 
Thr Ala Leu Arg Leu Asn His Met Ser Asn Lys Thr Thr Ser Ser Lys 
1080 1085 1090 

AAC ATG GAA ATG GTC CAA CAG AAA AAA GAG GGC CCC ATT CCA CCA GAT 4 326 

Asn Met Glu Met Val Gin Gin Lys Lys Glu Gly Pro lie Pro Pro Asp 
1095 1100 * 1105 

GCA CAA AAT CCA GAT ATG TCG TTC TTT AAG ATG CTA TTC TTG CCA GAA 4 37 4 
Ala Gin Asn Pro Asp Met Ser Phe Phe Lys Met Leu Phe Leu Pro Glu 
1110 1115 1120 

TCA GCA AGG TGG ATA CAA AGG ACT CAT GGA AAG AAC TCT CTG AAC TCT 4 4 22 
Ser Ala Arg Trp lie Gin Arg Thr His Gly Lys Asn Ser Leu Asn Ser 
1125 1130 1135 

GGG CAA GGC CCC AGT CCA AAG CAA TTA GTA TCC TTA GGA CCA GAA AAA 4 4 70 

Gly Gin Gly Pro Ser Pro Lys Gin Leu Val Ser Leu Gly Pro Glu Lys 
1140 1145 1150 1155 

TCT GTG GAA GGT CAG AAT TTC TTG TCT GAG AAA AAC AAA GTG GTA GTA 4 518 
Ser Val Glu Gly Gin Asn Phe Leu Ser Glu Lys Asn Lys Val Val Val 
1160 1165 1170 

GGA AAG GGT GAA TTT ACA AAG GAC GTA GGA CTC AAA GAG ATG GTT TTT 4 5 66 

Gly Lys Gly Glu Phe Thr Lys Asp Val Gly Leu Lys Glu Met Val Phe 
1175 1180 1185 

CCA AGC AGC AGA AAC CTA TTT CTT ACT AAC TTG GAT AAT TTA CAT GAA 4 614 
Pro Ser Ser Arg Asn Leu Phe Leu Thr Asn Leu Asp Asn Leu His Glu 
1190 1195 1200 

AAT AAT ACA CAC AAT CAA GAA AAA AAA ATT CAG GAA GAA ATA GAA AAG 4 662 

Asn Asn Thr His Asn Gin Glu Lys Lys lie Gin Glu Glu lie Glu Lys 
1205 1210 1215 

AAG GAA ACA TTA ATC CAA GAG AAT GTA GTT TTG CCT CAG ATA CAT ACA 4 710 

Lys Glu Thr Leu lie Gin Glu Asn Val Val Leu Pro Gin lie His Thr 
1220 1225 1230 1235 

GTG ACT GGC ACT AAG AAT TTC ATG AAG AAC CTT TTC TTA CTG AGC ACT 4 758 

Val Thr Gly Thr Lys Asn Phe Met Lys Asn Leu Phe Leu Leu Ser Thr 
1240 1245 1250 

AGG CAA AAT GTA GAA GGT TCA TAT GAG GGG GCA TAT GCT CCA GTA CTT 4 806 

Arg Gin Asn Val Glu Gly Ser Tyr Glu Gly Ala Tyr Ala Pro Val Leu 
1255 1260 1265 

CAA GAT TTT AGG TCA TTA AAT GAT TCA ACA AAT AGA ACA AAG AAA CAC 4 854 

Gin Asp Phe Arg Ser Leu Asn Asp Ser Thr Asn Arg Thr Lys Lys His 
1270 1275 1280 
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ACA GCT CAT TTC TCA AAA AAA GGG GAG GAA GAA AAC TTG GAA GGC TTG 4 902 
Thr Ala His Phe Ser Lys Lys Gly Glu Glu Glu Asn Leu Glu Gly Leu 
1285 1290 1295 

GGA AAf CAA ACC AAG CAA ATT GTA GAG AAA TAT GCA TGC ACC ACA AGG 4 950 
Gly Asn Gin Thr Lys Gin lie Val Glu Lys Tyr Ala Cys Thr Thr Arg 
1300 1305 1310 1315 

ATA TCT CCT AAT ACA AGC CAG CAG AAT TTT GTC ACG CAA CGT AGT AAG 4 9 98 

lie Ser Pro Asn Thr Ser Gin Gin Asn Phe Val Thr Gin Arg Ser Lys 
1320 1325 " 1330 

AGA GCT TTG AAA CAA TTC AGA CTC CCA CTA GAA GAA ACA GAA CTT GAA 50 4 6 

Arg Ala Leu Lys Gin Phe Arg Leu Pro Leu Glu Glu Thr Glu Leu Glu 
1335 1340 1345 

AAA AGG ATA ATT GTG GAT GAC ACC TCA ACC CAG TGG TCC AAA AAC ATG 5094 
Lys Arg lie lie Val Asp Asp Thr Ser Thr Gin Trp Ser Lys Asn Met 
1350 1355 1360 

AAA CAT TTG ACC CCG AGC ACC CTC ACA CAG ATA GAC TAC AAT GAG AAG 514 2 
Lys His Leu Thr Pro Ser Thr Leu Thr Gin lie Asp Tyr Asn Glu Lys 
1365 1370 1375 

GAG AAA GGG GCC ATT ACT CAG TCT CCC TTA TCA GAT TGC CTT ACG AGG 5190 
Glu Lys Gly Ala lie Thr Gin Ser Pro Leu Ser Asp Cys Leu Thr Arg 
1380 1385 1390 1395 

AGT CAT AGC ATC CCT CAA GCA AAT AGA TCT CCA TTA CCC ATT GCA AAG 5238 
Ser His Ser lie Pro Gin Ala Asn Arg Ser Pro Leu Pro lie Ala Lys 
1400 1405 1410 

GTA TCA TCA TTT CCA TCT ATT AGA CCT ATA TAT CTG ACC AGG GTC CTA 52 8 6 

Val Ser Ser Phe Pro Ser lie Arg Pro lie Tyr Leu Thr Arg Val Leu 
1415 1420 1425 

TTC CAA GAC AAC TCT TCT CAT CTT CCA GCA GCA TCT TAT AGA AAG AAA 5 334 

Phe Gin Asp Asn Ser Ser His Leu Pro Ala Ala Ser Tyr Arg Lys Lys 
1430 1435 1440 

GAT TCT GGG GTC CAA GAA AGC AGT CAT TTC TTA CAA GGA GCC AAA AAA 5382 
Asp Ser Gly Val Gin Glu Ser Ser His Phe Leu Gin Gly Ala Lys Lys 
1445 1450 1455 

AAT AAC CTT TCT TTA GCC ATT CTA ACC TTG GAG ATG ACT GGT GAT CAA 54 30 
Asn Asn Leu Ser Leu Ala lie Leu Thr Leu Glu Met Thr Gly Asp Gin 
1460 1465 1470 1475 

AGA GAG GTT GGC TCC CTG GGG ACA AGT GCC ACA AAT TCA GTC ACA TAC 54 78 
Arg Glu Val Gly Ser Leu Gly Thr Ser Ala Thr Asn Ser Val Thr Tyr 
1480 1485 1490 

AAG AAA GTT GAG AAC ACT GTT CTC CCG AAA CCA GAC TTG CCC AAA ACA 552 6 
Lys Lys Val Glu Asn Thr Val Leu Pro Lys Pro Asp Leu Pro Lys Thr 
1495 1500 1505 
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TCT GGC AAA GTT GAA TTG CTT CCA AAA GTT CAC ATT TAT CAG AAG GAC 557 4 
Ser Gly Lys Val Glu Leu Leu Pro Lys Val His lie Tyr Gin Lys Asp 
1510 1515 1520 

CTA TTC CCT ACG GAA ACT AGC AAT GGG TCT CCT GGC CAT CTG GAT CTC 5622 
Leu Phe Pro Thr Glu Thr Ser Asn Gly Ser Pro Gly His Leu Asp Leu 
1525 1530 1535 

GTG GAA GGG AGC CTT CTT CAG GGA ACA GAG GGA GCG ATT AAG TGG AAT 5 67 0 
Val Glu Gly Ser Leu Leu Gin Gly Thr Glu Gly Ala lie Lys Trp Asn 
1540 1545 1550 1555 

GAA GCA AAC AGA CCT GGA AAA GTT CCC TTT CTG AGA GTA GCA ACA GAA 5718 
Glu Ala Asn Arg Pro Gly Lys Val Pro Phe Leu Arg Val Ala Thr Glu 
1560 1565 1570 

AGC TCT GCA AAG ACT CCC TCC AAG CTA TTG GAT CCT CTT GCT TGG GAT 57 66 

Ser Ser Ala Lys Thr Pro Ser Lys Leu Leu Asp Pro Leu Ala Trp Asp 
1575 1580 1585 

AAC CAC TAT GGT ACT CAG ATA CCA AAA GAA GAG TGG AAA TCC CAA GAG 5814 
Asn His Tyr Gly Thr Gin He Pro Lys Glu Glu Trp Lys Ser Gin Glu 
1590 1595 1600 

AAG TCA CCA GAA AAA ACA GCT TTT AAG AAA AAG GAT ACC ATT TTG TCC 5 8 62 

Lys Ser Pro Glu Lys Thr Ala Phe Lys Lys Lys Asp Thr He Leu Ser 
1605 1610 1615 

CTG AAC GCT TGT GAA AGC AAT CAT GCA ATA GCA GCA ATA AAT GAG GGA 5 910 

Leu Asn Ala Cys Glu Ser Asn His Ala He Ala Ala He Asn Glu Gly 
1620 1625 1630 1635 

CAA AAT AAG CCC GAA ATA GAA GTC ACC TGG GCA AAG CAA GGT AGG ACT 5 958 

Gin Asn Lys Pro Glu He Glu Val Thr Trp Ala Lys Gin Gly Arg Thr 
1640 1645 1650 

GAA AGG CTG TGC TCT CAA AAC CCA CCA GTC TTG AAA CGC CAT CAA CGG 600 6 

Glu Arg Leu Cys Ser Gin Asn Pro Pro Val Leu Lys Arg His Gin Arg 
1655 1660 1665 

GAA ATA ACT CGT ACT ACT CTT CAG TCA GAT CAA GAG GAA ATT GAC TAT 605 4 

Glu He Thr Arg Thr Thr Leu Gin Ser Asp Gin Glu Glu He Asp Tyr 
1670 1675 1680 

GAT GAT ACC ATA TCA GTT GAA ATG AAG AAG GAA GAT TTT GAC ATT TAT 6102 
Asp Asp Thr He Ser Val Glu Met Lys Lys Glu Asp Phe Asp He Tyr 
1685 1690 1695 

GAT GAG GAT GAA AAT CAG AGC CCC CGC AGC TTT CAA AAG AAA ACA CGA 6150 
Asp Glu Asp Glu Asn Gin Ser Pro Arg Ser Phe Gin Lys Lys Thr Arg 
1700 1705 1710 1715 

CAC TAT TTT ATT GCT GCA GTG GAG AGG CTC TGG GAT TAT GGG ATG AGT 6198 
His Tyr Phe He Ala Ala Val Glu Arg Leu Trp Asp Tyr Gly Met Ser 
1720 1725 1730 
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AGC TCC CCA CAT GTT CTA AGA AAC AGG GCT CAG AGT GGC AGT GTC CCT 624 6 

Ser Ser Pro His Val Leu Arg Asn Arg Ala Gin Ser Gly Ser Val Pro 
1735 1740 1745 

CAG TTC AAG AAA GTT GTT TTC CAG GAA TTT ACT GAT GGC TCC TTT ACT 62 94 
Gin Phe Lys Lys Val Val Phe Gin Glu Phe Thr Asp Gly Ser Phe Thr 
1750 1755 1760 

CAG CCC TTA TAC CGT GGA GAA CTA AAT GAA CAT TTG GGA CTC CTG GGG 634 2 

Gin Pro Leu Tyr Arg Gly Glu Leu Asn Glu His Leu Gly Leu Leu Gly 
1765 1770 1775 

CCA TAT ATA AGA GCA GAA GTT GAA GAT AAT ATC ATG GTA ACT TTC AGA 6390 
Pro Tyr lie Arg Ala Glu Val Glu Asp Asn lie Met Val Thr Phe Arg 
1780 1785 1790 1795 

AAT CAG GCC TCT CGT CCC TAT TCC TTC TAT TCT AGC CTT ATT TCT TAT 6438 
Asn Gin Ala Ser Arg Pro Tyr Ser Phe Tyr Ser Ser Leu lie Ser Tyr 
1800 1805 1810 

GAG GAA GAT CAG AGG CAA GGA GCA GAA CCT AGA AAA AAC TTT GTC AAG 64 8 6 
Glu Glu Asp Gin Arg Gin Gly Ala Glu Pro Arg Lys Asn Phe Val Lys 
1815 1820 1825 

CCT AAT GAA ACC AAA ACT TAC TTT TGG AAA GTG CAA CAT CAT ATG GCA 6534 
Pro Asn Glu Thr Lys Thr Tyr Phe Trp Lys Val Gin His His Met Ala 
1830 1835 1840 

CCC ACT AAA GAT GAG TTT GAC TGC AAA GCC TGG GCT TAT TTC TCT GAT 65 8 2 

Pro Thr Lys Asp Glu Phe Asp Cys Lys Ala Trp Ala Tyr Phe Ser Asp 
1845 1850 1855 

GTT GAC CTG GAA AAA GAT GTG CAC TCA GGC CTG ATT GGA CCC CTT CTG 6630 
Val Asp Leu Glu Lys Asp Val His Ser Gly Leu lie Gly Pro Leu Leu 
1860 1865 1870 1875 

GTC TGC CAC ACT AAC ACA CTG AAC CCT GCT CAT GGG AGA CAA GTG ACA 6678 
Val Cys His Thr Asn Thr Leu Asn Pro Ala His Gly Arg Gin Val Thr 
1880 1885 1890 

GTA CAG GAA TTT GCT CTG TTT TTC ACC ATC TTT GAT GAG ACC AAA AGC 672 6 

Val Gin Glu Phe Ala Leu Phe Phe Thr lie Phe Asp Glu Thr Lys Ser 
1895 1900 1905 

TGG TAC TTC ACT GAA AAT ATG GAA AGA AAC TGC AGG GCT CCC TGC AAT 677 4 

Trp Tyr Phe Thr Glu Asn Met Glu Arg Asn Cys Arg Ala Pro Cys Asn 
1910 1915 1920 

ATC CAG ATG GAA GAT CCC ACT TTT AAA GAG AAT TAT CGC TTC CAT GCA 6822 
lie Gin Met Glu Asp Pro Thr Phe Lys Glu Asn Tyr Arg Phe His Ala 
1925 1930 1935 

ATC AAT GGC TAC ATA ATG GAT ACA CTA CCT GGC TTA GTA ATG GCT CAG 687 0 

lie Asn Gly Tyr lie Met Asp Thr Leu Pro Gly Leu Val Met Ala Gin 
1940 1945 1950 1955 
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GAT CAA AGG ATT CGA TGG TAT CTG CTC AGC ATG GGC AGC AAT GAA AAC 6918 
Asp Gin Arg lie Arg Trp Tyr Leu Leu Ser Met Gly Ser Asn Glu Asn 
1960 1965 1970 

ATC CAT TCT ATT CAT TTC AGT GGA CAT GTG TTC ACT GTA CGA AAA AAA 696 6 
He His Ser He His Phe Ser Gly His Val Phe Thr Val Arg Lys Lys 
1975 1980 1985 

GAG GAG TAT AAA ATG GCA CTG TAC AAT CTC TAT CCA GGT GTT TTT GAG 7 014 
Glu Glu Tyr Lys Met Ala Leu Tyr Asn Leu Tyr Pro Gly Val Phe Glu 
1990 1995 2000 

ACA GTG GAA ATG TTA CCA TCC AAA GCT GGA ATT TGG CGG GTG GAA TGC 7 0 62 

Thr Val Glu Met Leu Pro Ser Lys Ala Gly He Trp Arg Val Glu Cys 
2005 2010 2015 

CTT ATT GGC GAG CAT CTA CAT GCT GGG ATG AGC ACA CTT TTT CTG GTG 7110 
Leu He Gly Glu His Leu His Ala Gly Met Ser Thr Leu Phe Leu Val 
2020 2025 2030 2035 

TAC AGC AAT AAG TGT CAG ACT CCC CTG GGA ATG GCT TCT GGA CAC ATT 7158 
Tyr Ser Asn Lys Cys Gin Thr Pro Leu Gly Met Ala Ser Gly His He 
2040 2045 2050 

AGA GAT TTT CAG ATT ACA GCT TCA GGA CAA TAT GGA CAG TGG GCC CCA 72 0 6 
Arg Asp Phe Gin He Thr Ala Ser Gly Gin Tyr Gly Gin Trp Ala Pro 
2055 2060 2065 

AAG CTG GCC AGA CTT CAT TAT TCC GGA TCA ATC AAT GCC TGG AGC ACC 72 54 

Lys Leu Ala Arg Leu His Tyr Ser Gly Ser He Asn Ala Trp Ser Thr 
2070 2075 2080 

AAG GAG CCC TTT TCT TGG ATC AAG GTG GAT CTG TTG GCA CCA ATG ATT 7 302 

Lys Glu Pro Phe Ser Trp He Lys Val Asp Leu Leu Ala Pro Met lie 
2085 2090 2095 

ATT CAC GGC ATC AAG ACC CAG GGT GCC CGT CAG AAG TTC TCC AGC CTC 7350 
He His Gly He Lys Thr Gin Gly Ala Arg Gin Lys Phe Ser Ser Leu 
2100 2105 2110 2115 

TAC ATC TCT CAG TTT ATC ATC ATG TAT AGT CTT GAT GGG AAG AAG TGG 7 398 

Tyr He Ser Gin Phe He He Met Tyr Ser Leu Asp Gly Lys Lys Trp 
2120 2125 2130 

CAG ACT TAT CGA GGA AAT TCC ACT GGA ACC TTA ATG GTC TTC TTT GGC 7 44 6 

Gin Thr Tyr Arg Gly Asn Ser Thr Gly Thr Leu Met Val Phe Phe Gly 
2135 2140 2145 

AAT GTG GAT TCA TCT GGG ATA AAA CAC AAT ATT TTT AAC CCT CCA ATT 7 4 94 

Asn Val Asp Ser Ser Gly He Lys His Asn He Phe Asn Pro Pro He 
2150 2155 2160 

ATT GCT CGA TAC ATC CGT TTG CAC CCA ACT CAT TAT AGC ATT CGC AGC 7 542 
He Ala Arg Tyr He Arg Leu His Pro Thr His Tyr Ser He Arg Ser 
2165 2170 2175 
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ACT CTT CGC ATG GAG TTG ATG GGC TGT GAT TTA AAT AGT TGC AGC ATG 7 5 90 
Thr Leu Arg Met Glu Leu Met Gly Cys Asp Leu Asn Ser Cys Ser Met 
2180 2185 2190 2195 

CCA TTG GGA ATG GAG AGT AAA GCA ATA TCA GAT GCA CAG ATT ACT GCT 7 638 
Pro Leu Gly Met Glu Ser Lys Ala lie Ser Asp Ala Gin lie Thr Ala 
2200 2205 2210 

TCA TCC TAC TTT ACC AAT ATG TTT GCC ACC TGG TCT CCT TCA AAA GCT 7 68 6 
Ser Ser Tyr Phe Thr Asn Met Phe Ala Thr Trp Ser Pro Ser Lys Ala 
2215 2220 ' 2225 

CGA CTT CAC CTC CAA GGG AGG AGT AAT GCC TGG AGA CCT CAG GTG AAT 7 7 34 

Arg Leu His Leu Gin Gly Arg Ser Asn Ala Trp Arg Pro Gin Val Asn 
2230 2235 2240 

AAT CCA AAA GAG TGG CTG CAA GTG GAC TTC CAG AAG ACA ATG AAA GTC 7 7 82 

Asn Pro Lys Glu Trp Leu Gin Val Asp Phe Gin Lys Thr Met Lys Val 
2245 2250 2255 

ACA GGA GTA ACT ACT CAG GGA GTA AAA TCT CTG CTT ACC AGC ATG TAT 7 830 

Thr Gly Val Thr Thr Gin Gly Val Lys Ser Leu Leu Thr Ser Met Tyr 
2260 2265 2270 2275 

GTG AAG GAG TTC CTC ATC TCC AGC AGT CAA GAT GGC CAT CAG TGG ACT 787 8 

Val Lys Glu Phe Leu lie Ser Ser Ser Gin Asp Gly His Gin Trp Thr 
2280 2285 2290 

CTC TTT TTT CAG AAT GGC AAA GTA AAG GTT TTT CAG GGA AAT CAA GAC 7 92 6 

Leu Phe Phe Gin Asn Gly Lys Val Lys Val Phe Gin Gly Asn Gin Asp 
2295 2300 2305 

TCC TTC ACA CCT GTG GTG AAC TCT CTA GAC CCA CCG TTA CTG ACT CGC 7 974 

Ser Phe Thr Pro Val Val Asn Ser Leu Asp Pro Pro Leu Leu Thr Arg 
2310 2315 2320 

TAC CTT CGA ATT CAC CCC CAG AGT TGG GTG CAC CAG ATT GCC CTG AGG 8 022 
Tyr Leu Arg lie His Pro Gin Ser Trp Val His Gin lie Ala Leu Arg 
2325 2330 2335 

ATG GAG GTT CTG GGC TGC GAG GCA CAG GAC CTC TAC TGAGGGTGGC 8 068 
Met Glu Val Leu Gly Cys Glu Ala Gin Asp Leu Tyr 
2340 2345 2350 

CACTGCAGCA CCTGCCACTG CCGTCACCTC TCCCTCCTCA GCTCCAGGGC AGTGTCCCTC 8128 

CCTGGCTTGC CTTCTACCTT TGTGCTAAAT CCTAGCAGAC ACTGCCTTGA AGCCTCCTGA 818 8 

ATTAACTATC ATCAGTCCTG CATTTCTTTG GTGGGGGGCC AGGAGGGTGC ATCCAATTTA 824 8 

ACTTAACTCT TACCTATTTT CTGCAGCTGC TCCCAGATTA CTCCTTCCTT CCAATATAAC 8 308 

TAGGCAAAAA GAAGTGAGGA GAAACCTGCA TGAAAGCATT CTTCCCTGAA AAGTTAGGCC 8 368 

TCTCAGAGTC ACCACTTCCT CTGTTGTAGA AAAACTATGT GATGAAACTT TGAAAAAGAT 8 428 

ATT TAT GATG TTAACTTGTT TAT TGC AGC T TATAATGGTT AC AAAT AAAG CAATAGCATC 8 4 88 



40 



ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 8 54 8 

ATCAATGTAT CTTATCATGT CTGGATCCCC GGGTGGCATC CCTGTGACCC CTCCCCAGTG 8 608 

CCTCTCCTGG CCCTGGAAGT TGCCACTCCA GTGCCCACCA GCCTTGTCCT AATAAAATTA 8 668 

AGTTGCATCA TTTTGTCTGA CTAGGTGTCC TTCTATAATA TTATGGGGTG GAGGGGGGTG 87 28 

GTATGGAGCA AGGGGCAAGT TGGGAAGACA ACCTGTAGGG CCTGCGGGGT CTATTCGGGA 87 88 

ACCAAGCTGG AGTGCAGTGG CACAATCTTG GCTCACTGCA ATCTCCGCCT CCTGGGTTCA 88 4 8 

AGCGATTCTC CTGCCTCAGC CTCCCGAGTT GTTGGGATTC CAGGCATGCA TGACCAGGCT 8 908 

CAGCTAATTT TTGTTTTTTT GGTAGAGACG GGGTTTCACC ATATTGGCCA GGCTGGTCTC 8 968 

CAACTCCTAA TCTCAGGTGA TCTACCCACC TTGGCCTCCC AAATTGCTGG GATTACAGGC 9028 

GTGAACCACT GCTCCCTTCC CTGTCCTTCT GATTTTAAAA TAACTATACC AG C AG GAG G A 908 8 

CGTCCAGACA C AG C AT AG GC TACCTGCCAT GCCCAACCGG TGGGACATTT GAGTTGCTTG 914 8 

CTTGGCACTG TCCTCTCATG CGTTGGGTCC ACTCAGTAGA TGCCTGTTGA ATTCGTAATC 9208 

ATGGTCATAG CTGTTTCCTG TGTGAAATTG TTATCCGCTC ACAATTCCAC ACAACATACG 92 68 

AGCCGGAAGC ATAAAGTGTA AAGCCTGGGG TGCCTAATGA GTGAGCTAAC TCACATTAAT 9328 

TGCGTTGCGC TCACTGCCCG CTTTCCAGTC GGGAAACCTG TCGTGCCAGC TGCATTAATG 938 8 

AATCGGCCAA CGCGCGGGGA GAGGCGGTTT GCGTATTGGG CGCTCTTCCG CTTCCTCGCT 94 4 8 

CACTGACTCG CTGCGCTCGG TCGTTCGGCT GCGGCGAGCG GTATCAGCTC ACTCAAAGGC 9508 

GGTAATACGG TTATCCACAG AATCAGGGGA TAACGCAGGA AAGAACATGT GAGCAAAAGG 9568 

CCAGCAAAAG GCCAGGAACC GTAAAAAGGC CGCGTTGCTG GCGTTTTTCC ATAGGCTCCG 9628 

CCCCCCTGAC GAGCATCACA AAAATCGACG CTCAAGTCAG AGGTGGCGAA ACCCGACAGG 9688 

ACTATAAAGA TACCAGGCGT TTCCCCCTGG AAGCTCCCTC GTGCGCTCTC CTGTTCCGAC 97 4 8 

CCTGCCGCTT ACCGGATACC TGTCCGCCTT TCTCCCTTCG GGAAGCGTGG CGCTTTCTCA 9808 

TAGCTCACGC TGTAGGTATC TCAGTTCGGT GTAGGTCGTT CGCTCCAAGC TGGGCTGTGT 98 68 

GCACGAACCC CCCGTTCAGC CCGACCGCTG CGCCTTATCC GGTAACTATC GTCTTGAGTC 9 928 

CAACCCGGTA AGACACGACT TATCGCCACT GGCAGCAGCC ACTGGTAACA GGATTAGCAG 9988 

AGCGAGGTAT GTAGGCGGTG CTACAGAGTT CTTGAAGTGG TGGCCTAACT ACGGCTACAC 10 048 

TAGAAGGACA GTATTTGGTA TCTGCGCTCT GCTGAAGCCA GTTACCTTCG G AAAAAG AG T 10108 

TGGTAGCTCT TGATCCGGCA AACAAACCAC CGCTGGTAGC GGTGGTTTTT TTGTTTGCAA 10168 

GCAGCAGATT ACGCGCAGAA AAAAAGGATC TCAAGAAGAT CCTTTGATCT TTTCTACGGG 10228 
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GTCTGACGCT CAGTGGAACG AAAACTCACG TTAAGGGATT 

AAGGATCTTC ACCTAGATCC TTTTAAATTA AAAATGAAGT 

ATATGAGTAA ACTTGGTCTG ACAGTTACCA ATGCTTAATC 

GATCTGTCTA TTTCGTTCAT CCATAGTTGC CTGACTCCCC 

ACGGGAGGGC TTACCATCTG GCCCCAGTGC TGCAATGATA 

GGCTCCAGAT TTATCAGCAA TAAACCAGCC AGCCGGAAGG 

TGCAACTTTA TCCGCCTCCA TCCAGTCTAT TAATTGTTGC 

TTCGCCAGTT AATAGTTTGC GCAACGTTGT TGCCATTGCT 

CTCGTCGTTT GGTATGGCTT CATTCAGCTC CGGTTCCCAA 

ATCCCCCATG TTGTGCAAAA AAGCGGTTAG CTCCTTCGGT 

TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA 

CATGCCATCC GTAAGATGCT TTTCTGTGAC TGGTGAGTAC 

ATAGTGTATG CGGCGACCGA GTTGCTCTTG CCCGGCGTCA 

ACATAGCAGA ACTTTAAAAG TGCTCATCAT TGGAAAACGT 

AAGGATCTTA CCGCTGTTGA GATCCAGTTC GATGTAACCC 

TTCAGCATCT TTTACTTTCA CCAGCGTTTC TGGGTGAGCA 

CGCAAAAAAG GGAATAAGGG C G AC AC GG AA ATGTTGAATA 

AT AT T AT TGA AGCATTTATC AGGGTTATTG TCTCATGAGC 

TTAGAAAAAT AAACAAATAG GGGTTCCGCG CACATTTCCC 

CTAAGAAACC AT TAT TAT C A TGACATTAAC CTATAAAAAT 

TCGTCTCGCG CGTTTCGGTG ATGACGGTGA AAACCTCTGA 

GGTCACAGCT TGTCTGTAAG CGGATGCCGG GAGCAGACAA 

GGGTGTTGGC GGGTGTCGGG GCTGGCTTAA CTATGCGGCA 

AGTGCACCAT ATGCGGTGTG AAATACCGCA CAGATGCGTA 

GCGCCATTCG CCATTCAGGC TGCGCAACTG TTGGGAAGGG 

GCTATTACGC CAGCTGGCGA AAGGGGGATG TGCTGCAAGG 

AGGGTTTTCC CAGTCACGAC GTTGTAAAAC GACGGCCAGT 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 base pairs 
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TTGGTCATGA GAT TAT C AAA 10288 

TTTAAATCAA TCTAAAGTAT 1034 8 

AGTGAGGCAC CTATCTCAGC 104 08 

GTCGTGTAGA TAACTACGAT 104 68 

CCGCGAGACC CACGCTCACC 10528 

GCCGAGCGCA GAAGTGGTCC 10588 

CGGGAAGCTA GAGTAAGTAG 1064 8 

ACAGGCATCG TGGTGTCACG 10708 

CGATCAAGGC GAGTTACATG 107 68 

CCTCCGATCG TTGTCAGAAG 10828 

CTGCATAATT CTCTTACTGT 10888 

TCAACCAAGT CATTCTGAGA 10 948 

ATACGGGATA ATACCGCGCC 11008 

TCTTCGGGGC GAAAACTCTC 11068 

ACTCGTGCAC CCAACTGATC 1112 8 

AAAACAGGAA GGCAAAATGC 11188 

CTCATACTCT TCCTTTTTCA 112 4 8 

GGATACATAT TTGAATGTAT 11308 

CGAAAAGTGC CACCTGACGT 11368 

AGGCGTATCA CGAGGCCCTT 11428 

CACATGCAGC TCCCGGAGAC 114 88 

GCCCGTCAGG GCGCGTCAGC 1154 8 

TCAGAGCAGA TTGTACTGAG 11608 

AGGAGAAAAT ACCGCATCAG 11668 

CGATCGGTGC GGGCCTCTTC 11728 

CGATTAAGTT GGGTAACGCC 117 88 

GCCAAGCTTG GGCTGCAG 118 4 6 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
ATTGAACCAA GAAGCTTCTC CCAGGTAAGT TGCTAATAAA GCTTGGCAAG AGTATTTCAA 60 
GGAAGATGAA GTCATTAACT ATGCAAAATG CTTCTCAGGC AC C TAG G AAA AT GAG GAT GT 120 
GAGGCATTTC TACCCACTTG GTACATAAAA TTATTGCTTT TCCTCTTCTT TTTTTCTCCA 180 
GAACCCACCA GTCTTGAAAC GCCATCAACG G 211 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GTTGGTATCC TTTTTACAGC ACAACTTAAT GAGACAGATA GAAACTGGTC TTGTAGAAAC 60 
AGAGTAGTCG CCTGCTTTTC TGCCAGGTGC TGACTTCTCT CCCCTGGGCT GTTTTCATTT 120 
TCTCAG 12 6 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 126 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GTAAGTATCC TTTTTACAGC ACAACTTAAT GAGACAGATA GAAACTGGTC TTGTAGAAAC 60 
AGAGTAGTCG CCTGCTTTTC TGCCAGGTGC TGACTTCTCT CCCCTTCTCT TTTTTCCTTT 120 
TCTCAG 12 6 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GCCACCAUGG 10 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
AGGTTAATTT TTAAAAAGCA GTCAAAAGTC CAAGTGGCCC TTGCGAGCAT TTACTCTCTC 60 
TGTTTGCTCT GGTTAATAAT CTCAGGAGCA CAAACATTCC 100 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 223 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CTTTCTCTTT T CT T T T AC AT GAAGGGTCTG GCAGCCAAAG CAATCACTCA AAGTTCAAAC 60 
CTTATCATTT TTTGCTTTGT TCCTCTTGGC CTTGGTTTTG TACATCAGCT T T G AAAAT AC 120 
CATCCCAGGG TTAATGCTGG GGTTAATTTA TAACTAAGAG TGCTCTAGTT TTGCAATACA 180 
GGACATGCTA TAAAAATGGA AAGATGTTGC TTTCTGAGAG ATA 223 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 90 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AGAUCUCGAG AAAGCUAACA ACAAAGAACA ACAAACAACA AUCAGGAUAA CAAGAACGAA 60 
ACAAUAACAG CCACCAUGGA AAUAGAGCUC 90 
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